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Preface 


This book is meant as a short text in linear algebra for a one-term 
course. Except for an occasional example or exercise the text is logically 
independent of calculus, and could be taught early. In practice, I expect 
it to be used mostly for students who have had two or three terms of 
calculus. The course could also be given simultaneously with, or im- 
mediately after, the first course in calculus. 

I have included some examples concerning vector spaces of functions, 
but these could be omitted throughout without impairing the under- 
standing of the rest of the book, for those who wish to concentrate 
exclusively on euclidean space. Furthermore, the reader who does not 
like n = n can always assume that n = 1, 2, or 3 and omit other interpre- 
tations. However, such a reader should note that using n — n simplifies 
some formulas, say by making them shorter, and should get used to this 
as rapidly as possible. Furthermore, since one does want to cover both 
the case n — 2 and n — 3 at the very least, using n to denote either 
number avoids very tedious repetitions. 

The first chapter is designed to serve several purposes. First, and 
most basically, it establishes the fundamental connection between linear 
algebra and geometric intuition. There are indeed two aspects (at least) 
to linear algebra: the formal manipulative aspect of computations with 
matrices, and the geometric interpretation. I do not wish to prejudice 
one in favor of the other, and I believe that grounding formal manipula- 
tions in geometric contexts gives a very valuable background for those 
who use linear algebra. Second, this first chapter gives immediately 
concrete examples, with coordinates, for linear combinations, perpendicu- 
larity, and other notions developed later in the book. In addition to the 
geometric context, discussion of these notions provides examples for 
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subspaces, and also gives a fundamental interpretation for linear equa- 
tions. Thus the first chapter gives a quick. overview of many topics in 
the book. The content of the first chapter is also the most fundamental 
part of what is used in calculus courses concerning functions of several 
variables, which can do a lot of things without the more general ma- 
trices. If students have covered the material of Chapter I in another 
course, or if the instructor wishes to emphasize matrices right away, then 
the first chapter can be skipped, or can be used selectively for examples 
and motivation. 

After this introductory chapter, we start with linear equations, 
matrices, and Gauss elimination. This chapter emphasizes computational 
aspects of linear algebra. Then we deal with vector spaces, linear maps 
and scalar products, and their relations to matrices. This mixes both the 
computational and theoretical aspects. 

Determinants are treated much more briefly than in the first edition, 
and several proofs are omitted. Students interested in theory can refer to 
a more complete treatment in theoretical books on linear algebra. 

I have included a chapter on eigenvalues and eigenvectors. This gives 
practice for notions studied previously, and leads into material which is 
used constantly in all parts of mathematics and its applications. 

I am much indebted to Toby Orloff and Daniel Horn for their useful 
comments and corrections as they were teaching the course from a pre- 
liminary version of this book. I thank Allen Altman and Gimli Khazad 
for lists of corrections. 
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CHAPTER | 


Vectors 


The concept of a vector is basic for the study of functions of several 
variables. It provides geometric motivation for everything that follows. 
Hence the properties of vectors, both algebraic and geometric, will be 
discussed in full. 

One significant feature of all the statements and proofs of this part is 
that they are neither easier nor harder to prove in 3-space than they are 
in 2-space. 


I, $1. Definition of Points in Space 


We know that a number can be used to represent a point on a line, 
once a unit length is selected. 

A pair of numbers (ie. a couple of numbers) (x, y) can be used to 
represent a point in the plane. 

These can be pictured as follows: 


0 x 


(a) Point on a line (b) Point in a plane 
Figure 1 
We now observe that a triple of numbers (x, y, z) can be used to 


represent a point in space, that is 3-dimensional space, or 3-space. We 
simply introduce one more axis. Figure 2 illustrates this. 


2 VECTORS LT, $1] 


z-axis 


Se (z,y,z) 


—— € 


Z-Axis 


Figure 2 


Instead of using x, y, z we could also use (x,, x5, x4). The line could 
be called 1-space, and the plane could be called 2-space. 

Thus we can say that a single number represents a point in 1-space. 
A couple represents a point in 2-space. A triple represents a point in 3- 
space. 

Although we cannot draw a picture to go further, there is nothing to 
prevent us from considering a quadruple of numbers. 


(X4, X5, X34, X4) 


and decreeing that this is a point in 4-space. A quintuple would be a 
point in 5-space, then would come a sextuple, septuple, octuple, .... 

We let ourselves be carried away and define a point in n-space to be 
an n-tuple of numbers 


(X 1535559 95X3) 


if n is a positive integer. We shall denote such an n-tuple by a capital 
letter X, and try to keep small letters for numbers and capital letters for 
points. We call the numbers x,,...,x, the coordinates of the point X. 
For example, in 3-space, 2 is the first coordinate of the point (2, 3, —4), 
and —4 is its third coordinate. We denote n-space by R”. 

Most of our examples will take place when n = 2 or n = 3. Thus the 
reader may visualize either of these two cases throughout the book. 
However, three comments must be made. 

First, we have to handle n = 2 and n = 3, so that in order to avoid a 
lot of repetitions, it is useful to have a notation which covers both these 
cases simultaneously, even if we often repeat the formulation of certain 
results separately for both cases. 
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Second, no theorem or formula is simpler by making the assumption 
that n — 2 or 3. 
Third, the case n — 4 does occur in physics. 


Example 1. One classical example of 3-space is of course the space we 
live in. After we have selected an origin and a coordinate system, we can 
describe the position of a point (body, particle, etc.) by 3 coordi- 
nates. Furthermore, as was known long ago, it is convenient to extend 
this space to a 4-dimensional space, with the fourth coordinate as time, 
the time origin being selected, say, as the birth of Christ—although this 
is purely arbitrary (it might be more convenient to select the birth of the 
solar system, or the birth of the earth as the origin, if we could deter- 
mine these accurately). Then a point with negative time coordinate is a 
BC point, and a point with positive time coordinate is an AD point. 


Don’t get the idea that “time is the fourth dimension”, however. The 
above 4-dimensional space is only one possible example. In economics, 
for instance, one uses a very different space, taking for coordinates, say, 
the number of dollars expended in an industry. For instance, we could 
deal with a 7-dimensional space with coordinates corresponding to the 
following industries: 


1. Steel 2. Auto 3. Farm products 4. Fish 
5. Chemicals 6. Clothing 7. Transportation. 


We agree that a megabuck per year is the unit of measurement. Then a 
point 


(1,000, 800, 550, 300, 700, 200, 900) 


in this 7-space would mean that the steel industry spent one billion 
dollars in the given year, and that the chemical industry spent 700 mil- 
lion dollars in that year. 

The idea of regarding time as a fourth dimension is an old one. 
Already in the Encyclopédie of Diderot, dating back to the eighteenth 
century, d’Alembert writes in his article on “dimension”: 


Cette maniére de considérer les quantités de plus de trois dimensions est 
aussi exacte que l'autre, car les lettres peuvent toujours être regardées 
comme représentant des nombres rationnels ou non. J'ai dit plus haut qu'il 
n'était pas possible de concevoir plus de trois dimensions. Un homme 
d'esprit de ma connaissance croit qu'on pourrait cependant regarder la 
durée comme une quatriéme dimension, et que le produit temps par la 
solidité serait en quelque maniére un produit de quatre dimensions; cette 
idée peut étre contestée, mais elle a, ce me semble, quelque mérite, quand 
ce ne serait que celui de la nouveauté. 


Encyclopédie, Vol. 4 (1754), p. 1010 
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Translated, this means: 


This way of considering quantities having more than three dimensions 1s 
just as right as the other, because algebraic letters can always be viewed as 
representing numbers, whether rational or not. I said above that it was 
not possible to conceive more than three dimensions. A clever gentleman 
with whom I am acquainted believes that nevertheless, one could view 
duration as a fourth dimension, and that the product time by solidity 
would be somehow a product of four dimensions. This idea may be chal- 
lenged, but it has, it seems to me, some merit, were it only that of being 
new. 


Observe how d'Alembert refers to a “clever gentleman” when he appar- 
ently means himeself. He is being rather careful in proposing what must 
have been at the time a far out idea, which became more prevalent in 
the twentieth century. 

D'Alembert also visualized clearly higher dimensional spaces as “prod- 
ucts" of lower dimensional spaces. For instance, we can view 3-space as 
putting side by side the first two coordinates (x,, x5) and then the third 
X4. Thus we write 


R?-R?xR! 


We use the product sign, which should not be confused with other 
“products”, like the product of numbers. The word “product” is used in 
two contexts. Similarly, we can write 


R* =R? x Rİ. 
There are other ways of expressing R^ as a product, namely 

R* = R? x R’. 
This means that we view separately the first two coordinates (x,, x5) and 
the last two coordinates (x4, x4). We shall come back to such products 
later. 

We shall now define how to add points. If A, B are two points, say 
in 3-space, 
A = (41, 45, A3) and B = (b, by, b3) 
then we define A + B to be the point whose coordinates are 
A+B=(a, + b,, a, + b3, a, + b3). 


Example 2. In the plane, if A = (1,2) and B = (—3, 5), then 


A-B-(-2,)) 
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In 3-space, if A — (—1, z, 3) and B= es 7, —2), then 


A+B=(/2—1, n4 7, 1). 


Using a neutral n to cover both the cases of 2-space and 3-space, the 
points would be written 


2C disiosds) B = (b,,...,b,), 
and we define A 4- B to be the point whose coordinates are 
(a, + b,,...,a, + bp). 


We observe that the following rules are satisfied: 
l (A+ B)+C=A+4+(B+C). 
2. A+B=B+A. 
3. If we let 
O = (0,0,...,0) 
be the point all of whose coordinates are 0, then 


O+A=A+O0O=A 


for all A. 
4. Let A=(a,,...,a,) and let — A =(—a,,...,—a,). Then 


A+(—A)=0. 
All these properties are very simple, and are true because they are 


true for numbers, and addition of n-tuples is defined in terms of addition 
of their components, which are numbers. 


Note. Do not confuse the number 0 and the n-tuple (0,...,0). We 
usually denote this n-tuple by O, and also call it zero, because no diffi- 
culty can occur in practice. 


We shall now interpret addition and multiplication by numbers geo- 


metrically in the plane (you can visualize simultaneously what happens 
in 3-space). 


Example 3. Let A = (2,3) and B = (—1, 1). Then 


A 4 B — (1,4). 
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The figure looks like a parallelogram (Fig. 3). 


Figure 3 


Example 4. Let A = (3,1) and B = (1,2). Then 
A+ B= (4,3). 


We see again that the geometric representation of our addition looks like 
a parallelogram (Fig. 4). 


A+B 


Figure 4 


The reason why the figure looks like a parallelogram can be given in 
terms of plane geometry as follows. We obtain B= (1,2) by starting 
from the origin O = (0,0), and moving 1 unit to the right and 2 up. To 
get A+ B, we start from A, and again move 1 unit to the right and 2 
up. Thus the line segments between O and B, and between A and A+ B 
are the hypotenuses of right triangles whose corresponding legs are of 
the same length, and parallel. The above segments are therefore parallel 
and of the same length, as illustrated in Fig. 5. 


Figure 5 
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Example 5. If A = (3, 1) again, then — A = (—3, —1). If we plot this 
point, we see that — 4 has opposite direction to 4. We may view — A 
as the reflection of A through the origin. 


Figure 6 


We shall now consider multiplication of A by a number. If c is any 
number, we define cA to be the point whose coordinates are 


(ca,, .. . ,ca,). 
Example 6. If A = (2, — 1,5) and c = 7, then cA = (14, — 7,35). 
It is easy to verify the rules: 


5. c(A + B) 2 cA + cB. 
6. If ci, c, are numbers, then 


(c, + ¢,)A=c,A+C,A and (c,c,)A =c,(c, A). 
Also note that 
(—1)A = -A. 
What is the geometric representation of multiplication by a number? 
Example 7. Let A = (1,2) and c = 3. Then 
cA = (3, 6) 

as in Fig. 7(a). 

Multiplication by 3 amounts to stretching A by 3. Similarly, 4A 
amounts to stretching A by 4, ie. shrinking A to half its size. In general, 


if t is a number, t > 0, we interpret tA as a point in the same direction 
as A from the origin, but t times the distance. In fact, we define A and 


8 VECTORS [T, 81] 


B to have the same direction if there exists a number c >Q such that 
A — cB. We emphasize that this means A and B have the same direction 
with respect to the origin. For simplicity of language, we omit the words 
"with respect to the origin". 

Mulitiplication by a negative number reverses the direction. Thus 
—3A would be represented as in Fig. 7(b). 


3A = (3,6) 


24 = (1) 


(a) (b) 


Figure 7 


We define A, B (neither of which is zero) to have opposite directions if 
there is a number c < 0 such that cA = B. Thus when B = — A, then A, 
B have opposite direction. 


Exercises I, S1 


Find A + B, A — B, 3A, —2B in each of the following cases. Draw the points of 
Exercises 1 and 2 on a sheet of graph paper. 


3. A = (2, 1,5, B = (—1,1, 1) 4. A-(—1, —2, 3), B = (—1,3, —4) 
5. A — (1,3, 1, B— (2n, —3,7) 6. A = (15, —2, 4), B = (1,3, — 1) 


7. Let A = (1,2) and B = (3,1). Draw A + B, A + 2B, A + 3B, A — B, A — 2B, 
A — 3B on a sheet of graph paper. 


8. Let A, B be as in Exercise 1. Draw the points A + 2B, A+3B, A — 2B, 
A — 3B, A+4B on a sheet of graph paper. 


9. Let A and B be as drawn in Fig. 8. Draw the point A — B. 
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(€) (d) 


Figure 8 


I, 82. Located Vectors 


We define a located vector to be an ordered pair of points which we 


write AB. (This is not a product.) We visualize this as an arrow be- 
tween A and B. We call A the beginning point and B the end point of 
the located vector (Fig. 9). 


by — ag 


e— b, —a, —4 


Figure 9 


We observe that in the plane, 
b, =a, + (b, — a). 
Similarly, 


b, = a, + (b, — a3). 
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This means that 
B=A+(B-— A) 


Let AB and CD be two located vectors. We shall say that they are 
equivalent if B — A= D — C. Every located vector AB is equivalent to 
one whose beginning point is the origin, because AB is equivalent to 
O(B — A). Clearly this is the only located vector whose beginning point 


is the origin and which is equivalent to AB. If you visualize the parallelo- 
gram law in the plane, then it is clear that equivalence of two located 
vectors can be interpreted geometrically by saying that the lengths of the 
line segments determined by the pair of points are equal, and that the 
“directions” in which they point are the same. 


In the next figures, we have drawn the located vectors O(B — A), 
AB , and O(A — B), BA. 


Figure 10 Figure 11 


Example 1. Let P —(1, — 1,3) and Q— (2,4, 1). Then PQ Is equiva- 
lent to OC , where C = Ọ — P = (1, 5, —2). If 


A = (4, —2, 5) and B = (5, 3, 3), 
then PO is equivalent to AB because 
Q—P-B-—A«x(1,5, —2) 


Given a located vector OC whose beginning point is the origin, we 


shall say that it is located at the origin. Given any located vector AB, 
we shall say that it is located at A. 

A located vector at the origin is entirely determined by its end point. 
In view of this, we shall call an n-tuple either a point or a vector, de- 
pending on the interpretation which we have in mind. 


Two located vectors AB and PQ are said to be parallel if there is a 
number c #0 such that B— A=c(Q— P). They are said to have the 
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same direction if there is a number c > 0 such that B — A = c(Q — P), 
and have opposite direction if there is a number c « 0 such that 


B—A-c(Q — P). 


In the next pictures, we illustrate parallel located vectors. 


P 
B 
A 
Q 
(a) Same direction (b) Opposite direction 


Figure 12 


Example 2. Let 


PO) and Q = (—4, 2). 
Let 
A = (5, 1) and B —(—16, — 14). 
Then 
Q — P=(-7, —5) and B — A = (— 21, —15). 


Hence PỌ is parallel to AB, because B — A = 3(Ọ — P). Since 3 0, 


we even see that PQ and AB have the same direction. 

In a similar manner, any definition made concerning n-tuples can be 
carried over to located vectors. For instance, in the next section, we 
shall define what it means for n-tuples to be perpendicular. 


Figure 13 
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Then we can say that two located vectors AB and PQ are perpendicular 
if B — A is perpendicular to Q — P. In Fig. 13, we have drawn a picture 
of such vectors in the plane. 


Exercises I, $2 


In each case, determine which located vectors PQ and AB are equivalent. 
1. P=(1, -—1), Q =@, 3), A =(—1,5), B-(5,2) 

2. P=(1,4), Q = (3,5), A = (5,7), B = (1, 8). 

3. P=(1, —1,5), Q = (2, 3, —4), A = (3, 1, D), B = (0,5, 10). 

4 P=(2, 3, —4), Q =(—1, 3,5), A = (—2, 3, — 1), B = (—5, 3, 8). 

In each case, determine which located vectors PO and AB are parallel. 

5. P=(1, -—1), OQ = (4,3), A =(—1,5), B = (7, 1). 

6. P=(1,4), Q = (3,5), A = (5, 7), B = (9, 6). 

7. P=(1, —1,5), Q = (—2, 3, —4), A = (3,1,1), B = (—3, 9, — 17). 

8. P = (2, 3, —4), Q = (—1, 3,5), A = (—2, 3, —1), B- (—11,3, —28). 


9. Draw the located vectors of Exercises 1, 2, 5, and 6 on a sheet of paper to 


illustrate these exercises. Also draw the located vectors OP and BA. Draw 
the points Q — P, B — A, P — Q, and A — B. 


I, S3. Scalar Product 


It is understood that throughout a discussion we select vectors always in 
the same n-dimensional space. You may think of the cases n — 2 and 
n — 3 only. 

In 2-space, let A =(a,,a,) and B=(b,,b,). We define their scalar 
product to be 


A s B = a,b, + ayb,. 


In 3-space, let A = (a,,a;,a4) and B=(b,,b,,b3). We define their 
scalar product to be 


A-B = a,b, + a,b, + a,b,. 


In n-space, covering both cases with one notation, let A = (a,,...,a,) 
and B = (b,,...,b,) be two vectors. We define their scalar or dot product 
A-B to be 
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This product is a number. For instance, if 
A —(1,3, —2) and B = (—1,4, —3), 
then 
A-B= —1-4c-124 6-2 17. 
For the moment, we do not give a geometric interpretation to this scalar 
product. We shall do this later. We derive first some important proper- 
ties. The basic ones are: 
SP 1. We have A: B — B-A. 
SP 2. If A, B, C are three vectors, then 
A-(B - C) — A B- A. C — (B t C)- A. 
SP 3. If x is a number, then 
(x A)- B 2 x(A- B) and A-(xB) = x(A- B). 
SP 4. If A =O is the zero vector, then A- A = Q, and otherwise 
A: A » 0€. 


We shall now prove these properties. 
Concerning the first, we have 


a,b, +: +a,b, = bua, +- 0 ba 


n? 


because for any two numbers a, b, we have ab = ba. This proves the 
first property. 
For SP 2, let C = (c,,...,c,). Then 
B+ C= (b; + ¢,,...,b, + c) 


and 


A-(B + C) 2 a (b, +c) +--+ + alba + Cy) 
= a,b, + aycy +--+ + a,b, + asc, 


Reordering the terms yields 


a,b, crab + a,c, +--+ ac 


n° 
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which is none other than A-B+ A-C. This proves what we wanted. 
We leave property SP 3 as an exercise. 

Finally, for SP 4, we observe that if one coordinate a; of A is not 
equal to 0, then there is a term a? #0 and a? > 0 in the scalar product 


A-A=ar+---+a?. 


Since every term is = 0, it follows that the sum is > 0, as was to be 
shown. 


In much of the work which we shall do concerning vectors, we shall 
use only the ordinary properties of addition, multiplication by numbers, 
and the four properties of the scalar product. We shall give a formal 
discussion of these later. For the moment, observe that there are other 
objects with which you are familiar and which can be added, subtracted, 
and multiplied by numbers, for instance the continuous functions on an 
interval [a,b] (cf. Example 2 of Chapter VI, §1). 

Instead of writing A-A for the scalar product of a vector with itself, it 
will be convenient to write also A?. (This is the only instance when we 
allow ourselves such a notation. Thus A? has no meaning.) As an exer- 
cise, verify the following identities: 


(A + BP = A? - 24- B 4- B?, 
(A — B = A? — 2A. B + B°. 


A dot product A- B may very well be equal to 0 without either A or 
B being the zero vector. For instance, let 


A — (1,2, 3) and B = (2, 1, — $). 
Then 
A-B=0 

We define two vectors A, B to be perpendicular (or as we shall also 
say, orthogonal), if A-B =0. For the moment, it is not clear that in the 
plane, this definition coincides with our intuitive geometric notion of 
perpendicularity. We shall convince you that it does in the next section. 
Here we merely note an example. Say in R°, let 


E, =(1,0,0) E,=(0,1,0), E, — (0,0, 1) 


be the three unit vectors, as shown on the diagram (Fig. 14). 
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Figure 14 


Then we see that E,-E; — 0, and similarly E; E; — 0 if i#j. And 
these vectors look perpendicular. If A = (a,, a5, a4), then we observe that 
the i-th component of A, namely 


à; — A- E; 


l l 


is the dot product of A with the i-th unit vector. We see that A is 
perpendicular to E; (according to our definition of perpendicularity with 
the dot product) if and only if its i-th component is equal to 0. 


Exercises I, §3 


1. Find A-A for each of the following n-tuples. 
(a) A=(2, —1), B=(—1,1) (b) A =(—1, 3), B = (0,4) 
(c) A=(2, —1,5, B=(—1, 1, 1) (d) A =(—1, —2, 3), B = (—1, 3, —4) 
(e) A = (n, 3, — 1), B= (22, -3,7 (f) A=(15, —2, 4), B= (a, 3, — 1) 


2. Find A-B for each of the above n-tuples. 


3. Using only the four properties of the scalar product, verify in detail the identi- 
ties given in the text for (A + B)? and (A — BY. 


4. Which of the following pairs of vectors are perpendicular? 
(a) (1, e. 1) and (2, l, 5) (b) (1, =k 1) and (2, 3; 1) 
(c) (—5, 2, 7) and (3, — 1, 2) (d) (x, 2, 1) and (2, — 7, 0) 


5. Let A be a vector perpendicular to every vector X. Show that A = O. 


I, $84. The Norm of a Vector 


We define the norm of a vector A, and denote by ||A|, the number 


|All =v A- A. 
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Since 4-4 2 0, we can take the square root. The norm is also some- 
times called the magnitude of A. 


When n — 2 and A - (a, b), then 


|All = Ja? + b^, 


as in the following picture (Fig. 15). 


Figure 15 


Example 1. If A = (1,2), then 


|All =./14+4=/5. 


When n = 3 and A = (a,,a5,a3), then 


| Al] = Jai + a3 aj. 
Example 2. If A = (—1, 2, 3), then 


JA =/1 +449 = 14. 


If n = 3, then the picture looks like Fig. 16, with A = (x, y, z). 


Vw 2 - Nata P42 


Figure 16 
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If we first look at the two components (x, y), then the length of the 


segment between (0, 0) and (x, y) is equal to w = ./x* + y?, as indicated. 
Then again the norm of A by the Pythagoras theorem would be 


w^ z^ x Hy? dz. 
Thus when n — 3, our definition of norm is compatible with the geom- 


etry of the Pythagoras theorem. 
In terms of coordinates, A = (a,,...,a,) we see that 


|All = at +--+ až. 


If A #0, then ||A|| 40 because some coordinate a; #0, so that a? > 0, 
and hence a? + --- +a? > 0, so ||A|| 40. 
Observe that for any vector A we have 


|All = | — Al. 


This is due to the fact that 
(—a,)? +--+ + (—a,)? =a} +--+ 4+ a}, 


because (— 1)? = 1. Of course, this is as it should be from the picture: 


A 


Figure 17 


Recall that A and —A are said to have opposite direction. However, 
they have the same norm (magnitude, as is sometimes said when speak- 
ing of vectors). 

Let A, B be two points. We define the distance between A and B to 


be 
| A — Bl = ./(A — B)-(A — B). 
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This definition coincides with our geometric intuition when A, B are 
points in the plane (Fig. 18). It is the same thing as the length of the 
located vector AB or the located vector BA. 


B 


A Length = ||A — B|| = ||B — A | 


Figure 18 


Example 3. Let 4—(—1,2) and B = (3,4). Then the length of the 
located vector AB is |B — All. But B— A = (4, 2). Thus 


IB — Al =./16 + 4 = 7/20. 


In the picture, we see that the horizontal side has length 4 and the 
vertical side has length 2. Thus our definitions reflect our geometric 
intuition derived from Pythagoras. 


Figure 19 


Let P be a point in the plane, and let a be a number > 0. The set of 
points X such that 


IX — Pl «a 


will be called the open disc of radius a centered at P. The set of points 
X such that 


IX- P| <a 


[L 84] THE NORM OF A VECTOR 19 


will be called the closed disc of radius a and center P. The set of points 
X such that 


|X — Pl =a 


is called the circle of radius a and center P. These are illustrated in Fig. 
20. 


Circle Disc 


Figure 20 


In 3-dimensional space, the set of points X such that 
IX — Pl «a 


will be called the open ball of radius a and center P. The set of points X 
such that 


|X — Pll Sa 


will be called the closed ball of radius a and center P. The set of points 
X such that 


|X — Pll =a 


will be called the sphere of radius a and center P. In higher dimensional 
space, one uses this same terminology of ball and sphere. 
Figure 21 illustrates a sphere and a ball in 3-space. 


Sphere Ball 


Figure 21 
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The sphere is the outer shell, and the ball consists of the region inside 
the shell. The open ball consists of the region inside the shell excluding 
the shell itself. The closed ball consists of the region inside the shell and 
the shell itself. 

From the geometry of the situation, it is also reasonable to expect 
that if c > O, then ||cAl|| = c|A||, i.e. if we stretch a vector A by multiply- 
ing by a positive number c, then the length stretches also by that 
amount. We verify this formally using our definition of the length. 


Theorem 4.1 Let x be a number. Then 

xA] = |x| [AI 
(absolute value of x times the norm of A). 
Proof. By definition, we have 


|xAI^ = (xA)- (xA), 
which is equal to 
x?(A- A) 


by the properties of the scalar product. Taking the square root now 
yields what we want. 

Let S, be the sphere of radius 1, centered at the origin. Let a be a 
number O0. If X is a point of the sphere S,, then aX is a point of the 
sphere of radius a, because 


laX|| = allX | = a. 


In this manner, we get all points of the sphere of radius a. (Proof?) 
Thus the sphere of radius a is obtained by stretching the sphere of radius 
1, through multiplication by a. 

A similar remark applies to the open and closed balls of radius a, 
they being obtained from the open and closed balls of radius 1 through 
multiplication by a. 


Disc of radius 1 Disc of radius a 


Figure 22 
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21 
We shall say that a vector E is a unit vector if | E|| = 1. Given any 
vector A, let a = | A|. If a 40, then 
1 
— A 
a 


is a unit vector, because 


1 | 1 
— Aji=-a=1. 
a a 


We say that two vectors A, B (neither of which is O) have the same 


direction if there is a number c > 0 such that cA = B. In view of this 
definition, we see that the vector 


1 
A 
IA] 


is a unit vector in the direction of A (provided A z O). 


Figure 23 


If E is the unit vector in the direction of A, and ||A|| = a, then 
A gE; 


Example 4. Let A = (1,2, —3) Then || =./14. Hence the unit 
vector in the direction of A is the vector 


RUE 


Warning. There are as many unit vectors as there are directions. The 
three standard unit vectors in 3-space, namely 


E, = (1,0, 0), E, = (0, 1, 0), E, = (0, 0, 1) 


are merely the three unit vectors in the directions of the coordinate axes. 
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We are also in the position to justify our definition of perpendicular- 
ity. Given A, B in the plane, the condition that 
|A + Bl| = |A — Bl 


(illustrated in Fig. 24(b)) coincides with the geometric property that A 
should be perpendicular to B. 


(a) (b) 
Figure 24 


We shall prove: 


|| A + B|| = ||A — B|| if and only if A-B — O. 


Let < denote “if and only if”. Then 


|A + B| = ||A— Bl = ||A + Bi? = lA — BI? 
<> A? +2A-B + B* = A* —2A-B + P? 
< 4A-B=0 
< A-B=0. 


This proves what we wanted. 
General Pythagoras theorem. If A and B are perpendicular, then 
I4 + BI? = lA] + IBI. 


The theorem is illustrated on Fig. 25. 
A+B 


Figure 25 
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To prove this, we use the definitions, namely 
|A + B|? =(A + B). (A + B) = A? 4+ 2A-B + P? 
= ||All* + Bll’, 


because A-B = 0, and A-A = ||A||?, B-B = |B|? by definition. 


Remark. If A is perpendicular to B, and x is any number, then A is 
also perpendicular to xB because 


A:xB — xA: B—O. 


We shall now use the notion of perpendicularity to derive the notion 
of projection. Let A, B be two vectors and B z O. Let P be the point 


on the line through OB such that PA is perpendicular to OB, as 
shown on Fig. 26(a). 


Á 


(a) (b) 
Figure 26 


We can write 
P = cB 


for some number c. We want to find this number c explicitly in terms of 
A and B. The condition PA LOB means that 


A — P is perpendicular to B, 
and since P = cB this means that 
(A — cB). B = 6, 
in other words, 


A:B— cB. B —OQ. 


We can solve for c, and we find A- B = cB. B, so that 
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Conversely, if we take this value for c, and then use distributivity, dot- 
ting A—cB with B yields 0, so that A — cB is perpendicular to B. 
Hence we have seen that there is a unique number c such that A — cB is 
perpendicular to B, and c is given by the above formula. 


A-B 
Definition. The component of A along B is the number c = BoB’ 
a A-B 
The projection of A along B is the vector cB = PR B. 


Example 5. Suppose 
B — E; = (0,...,0,1,0,...,0) 


is the i-th unit vector, with 1 in the i-th component and 0 in all other 
components. 
If A m dus a,), then A- E; = a,. 


Thus A-E; is the ordinary i-th component of A. 


More generally, if B is a unit vector, not necessarily one of the E;, then 
we have simply 
c — A- B 


because B- B — 1 by definition of a unit vector. 


Example 6. Let A = (1,2, —3) and B = (1, 1,2). Then the component 
of A along B is the number 


Our construction gives an immediate geometric interpretation for the 
scalar product. Namely, assume A # O and look at the angle 0 between 
A and B (Fig. 27). Then from plane geometry we see that 


. c|B| 


s0- : 
|| A] 


or substituting the value for c obtained above. 


A-B= |All IBI 0 d s 0 A 
. = cos an CO Em quM 
|| A |] BI 
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Figure 27 


In some treatments of vectors, one takes the relation 
A-B = |Al |B|| cos 8 


as definition of the scalar product. This is subject to the following disad- 
vantages, not to say objections: 


(a) The four properties of the scalar product SP 1 through SP 4 are 
then by no means obvious. 

(b) Even in 3-space, one has to rely on geometric intuition to obtain 
the cosine of the angle between A and B, and this intuition is 
less clear than in the plane. In higher dimensional space, it fails 
even more. 

(c) It is extremely hard to work with such a definition to obtain 
further properties of the scalar product. 


Thus we prefer to lay obvious algebraic foundations, and then recover 
very simply all the properties. We used plane geometry to see the ex- 
pression 


A- B — (Al |B| cos 6. 


After working out some examples, we shall prove the inequality which 
allows us to justify this in n-space. 


Example 7. Let A = (1,2, —3) and B=(2,1,5). Find the cosine of 


the angle 0 between A and B. 
By definition, 


Example 8. Find the cosine of the angle between the two located 
vectors PO and PR where 


P-(12, 3, Q=(-2,1,5), R=(1,1,—4). 
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The picture looks like this: 


Figure 28 
We let 
A—-Q-—P--(—3,—1,8) and B = R — P = (0, —1, — 1). 


Then the angle between PQ and PR is the same as that between A and 
B. Hence its cosine is equal to 


A-B 0+1-8 —7 


IAN IBI 2 a 


We shall prove further properties of the norm and scalar product 
using our results on perpendicularity. First note a special case. If 


cos 0 — 


is the i-th unit vector of R", and 


A 5i) 


then 
A-E; =d; 


l 


is the i-th component of A, i.e. the component of A along E;. We have 


laj| = a? € a? + --- + a2 = |I, 


so that the absolute value of each component of A is at most equal to 
the length of A. 

We don't have to deal only with the special unit vector as above. Let 
E be any unit vector, that is a vector of norm 1. Let c be the compon- 
ent of A along E. We saw that 


ge AE. 
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Then A — cE is perpendicular to E, and 


A-—A-—cE-cE. 


Then A-— cE is also perpendicular to cE, and by the Pythagoras 
theorem, we find 


|All? = A — cEII? + [IcEI? = |4 — cE|? + c?. 
Thus we have the inequality c? € | A||?, and |c| € ||All. 


In the next theorem, we generalize this inequality to a dot product 
A-B when B is not necessarily a unit vector. 


Theorem 4.2. Let A, B be two vectors in R". Then 
|4- B| S |All] BI. 
Proof. If B = O, then both sides of the inequality are equal to 0, and 


so our assertion is obvious. Suppose that B z O. Let c be the compon- 
ent of A along B, so c=(A-B)/(B-B). We write 


A = A — cB + cB. 
By Pythagoras, 


|All]? = A — cBII? + IIcBI? = IA — cB? + c° IBI. 


Hence c?|B|? < | A||?. But 


(A- BY |A- B|/ |A- B|/ 
c?||B\|* = 5 |B? = a IBI^ = 25 
(B-B) BII BII 
Therefore 
|A- B|? 
< |All’. 
| Bll? 


Multiply by ||B||? and take the square root to conclude the proof. 


In view of Theorem 4.2, we see that for vectors A, B in n-space, the 
number 
A-B 
|All | Bl 


has absolute value < 1. Consequently, 


A-B 
x ETE Rem 
|All Bl 


IA 


l 


? 
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and there exists a unique angle 0 such that 0 < 0 < m, and such that 


A-B 


cos Q = ————. 
|| All BII 


We define this angle to be the angle between A and B. 
The inequality of Theorem 4.2 is known as the Schwarz inequality. 
Theorem 4.3. Let A, B be vectors. Then 
|A + B| < |All + IB]. 
Proof. Both sides of this inequality are positive or 0. Hence it will 


suffice to prove that their squares satisfy the desired inequality, in other 
words, 


(A + B)-(A + B) € (|All + IBID’. 
To do this, we consider 
(A + B (A- B)=A-A+2A-B + B-B. 
In view of our previous result, this satisfies the inequality 
< All? + 2) All Bll + IBI, 
and the right-hand side is none other than 
(LAT + |B)’. 

Our theorem is proved. 

Theorem 4.3 is known as the triangle inequality. The reason for this is 
that if we draw a triangle as in Fig. 29, then Theorem 4.3 expresses the 


fact that the length of one side is < the sum of the lengths of the other 
two sides. 


Figure 29 
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Remark. All the proofs do not use coordinates, only properties SP 1 
through SP 4 of the dot product. Hence they remain valid in more gen- 
eral situations, see Chapter VI. In n-space, they give us inequalities 
which are by no means obvious when expressed in terms of coordinates. 
For instance, the Schwarz inequality reads, in terms of coordinates: 


a,b, + abu] S (a2 + o + az)" (b? + seo b2))77. 


Just try to prove this directly, without the “geometric” intuition of Pyth- 
agoras, and see how far you get. 


Exercises I, $4 


1. Find the norm of the vector A in the following cases. 
(a) A=(2, 1, B=(—1,1) 
(b) A =(—1,3), B=(0,4) 
(c) A=(2, —1,5), B=(-—1,1,1) 
(d) A 2(—1, —2,3), B=(—1,3, —4) 
(e) A2 (7,3, —1), B = (22, — 3, 7) 
(f A=(15, —2, 4), B = (a, 3, — 1) 


2. Find the norm of vector B in the above cases. 
3. Find the projection of A along B in the above cases. 
4. Find the projection of B along A in the above cases. 


5. Find the cosine between the following vectors A and B. 
(a) A — (1, —2) and B = (5,3) 
(b A2(—3,4) and B = (2, —1) 
(c) A=(1, —2,3) and B = (—3, 1, 5) 
(d) A =(-2, 1,4) and B = (—1, —1, 3) 
(e) A=(-1,1,0) and B = (2,1, — 1) 


6. Determine the cosine of the angles of the triangle whose vertices are 
(a) (2, —1, 1), (1, i m (3, —4, — 4). 
(b) (3, 1, 1), (— 1,2, 1), (2, —2, 5). 


7. Let A,,...,A, be non-zero vectors which are mutually perpendicular, in 
other words A;-A;=0 if i #j. Let c,,...,c, be numbers such that 


cA, e 6A, 7 O. 


Show that all c; = 0. 


8. For any vectors A, B, prove the following relations: 
(a) IA + BI? + ||A — BI? = 2I AU? + 21 BI. 
(b) I4 + BI? = |A|? + ||BI* + 2A- B. 
(c) ||A + B|? — ||A — B|? =4A-B. 
Interpret (a) as a “parallelogram law”. 
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9. Show that if 0 is the angle between A and B, then 
l4 — BI? = AI? + (BI? — 21AT Bl] cos 0. 


10. Let A, B, C be three non-zero vectors. If A- B — A-C, show by an 
example that we do not necessarily have B — C. 


I, S5. Parametric Lines 


We define the parametric equation or parametric representation of a 
straight line passing through a point P in the direction of a vector 
A #0 to be 


X =P tA, 


where t runs through all numbers (Fig. 30). 


P+tA 


Figure 30 


When we give such a parametric representation, we may think of a 
bug starting from a point P at time t = 0, and moving in the direction of 
A. At time t, the bug is at the position P + tA. Thus we may interpret 
physically the parametric representation as a description of motion, in 
which A is interpreted as the velocity of the bug. At a given time t, the 
bug is at the point. 


X(t) — P + tA, 


which is called the position of the bug at time t. 

This parametric representation is also useful to describe the set of 
points lying on the line segment between two given points. Let P, Q be 
two points. Then the segment between P and Q consists of all the points 


S(t) = P + t(Q — P) with O<s<t<l. 


Indeed, O(Q — P) is a vector having the same direction as PO, as 
shown on Fig. 31. 
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Figure 31 


When t = 0, we have S(0) = P, so at time t = 0 the bug is at P. When 
t = 1, we have 


S()=P+(Q-—P)=Q, 


so when t = 1 the bug is at Q. As t goes from 0 to 1, the bug goes from 
P to Q. 


Example 1. Let P —(1, —3,4) and Q = (5,1, —2). Find the coordi- 
nates of the point which lies one third of the distance from P to Q. 

Let S(t) as above be the parametric representation of the segment 
from P to Q. The desired point is S(1/3), that is: 


l 


1 | 
(5)-r«i9-m-da-39*,64-6 


Ts 
Ecco). 
32) 


Warning. The desired point in the above example is not given by 


TT 
I 


Example 2. Find a parametric representation for the line passing 
through the two points P = (1, —3, 1) and Q = (—2,4, 5). 
We first have to find a vector in the direction of the line. We let 
A=P—Q, 
SO 


d 3. m ey 
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The parametric representation of the line is therefore 
X(t) = P + tA = (1, —3, 1) + ((3, —7, —4). 


Remark. It would be equally correct to give a parametric representa- 
tion of the line as 


Y(t) 2 P + tB where B=Q-P. 
Interpreted in terms of the moving bug, however, one parametrization 
gives the position of a bug moving in one direction along the line, start- 
ing from P at time t = 0, while the other parametrization gives the posi- 


tion of another bug moving in the opposite direction along the line, also 
starting from P at time t = 0. 


We shall now discuss the relation between a parametric representation 
and the ordinary equation of a line in the plane. 

Suppose that we work in the plane, and write the coordinates of a 
point X as (x, y). Let P = (p,q) and A = (a,b). Then in terms of the 
coordinates, we can write 

x= p + ta, y=q tb. 


We can then eliminate t and obtain the usual equation relating x and y. 
Example 3. Let P —(2,1) and A =(—1,5) Then the parametric 

representation of the line through P in the direction of A gives us 

(*) x=2-t, yal + St 

Multiplying the first equation by 5 and adding yields 

(s) 5x + y= 1l, 


which is the familiar equation of a line. 


This elimination of t shows that every pair (x, y) which satisfies the 
parametric representation (*) for some value of t also satisfies equation 
(xx). Conversely, suppose we have a pair of numbers (x, y) satisfying 
(xx) Lett—2-— x. Then 


ysll—5x-211—52-—120)-21 + St. 
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Hence there exists some value of t which satisfies equation (*). Thus we 
have proved that the pairs (x, y) which are solutions of (**) are exactly 
the same pairs of numbers as those obtained by giving arbitrary values 
for t in (x). Thus the straight line can be described parametrically as in 
(*) or in terms of its usual equation (**). Starting with the ordinary 
equation 


5x -y- 11, 


we let t = 2 — x in order to recover the specific parametrization of (x). 
When we parametrize a straight line in the form 


X =P tA, 


we have of course infinitely many choices for P on the line, and also 
infinitely many choices for A, differing by a scalar multiple. We can 
always select at least one. Namely, given an equation 


ax + by=c 


with numbers a, b, c, suppose that a #0. We use y as parameter, and 
let 


y=t. 


Then we can solve for x, namely 


Let P = (c/a, 0) and A =(—b/a,1). We see that an arbitrary point (x, y) 
satisfying the equation 


ax + by=c 
can be expressed parametrically, namely 
(x, y) 2 P tA. 
In higher dimensions, starting with a parametric representation 
X =P tA, 


we cannot eliminate t, and thus the parametric representation is the only 
one available to describe a straight line. 
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Exercises I, $5 


1. Find a parametric representation for the line passing through the following 
pairs of points. 
(a) P, (1,3, —1) and P, = (—4, 1, 2) 
(b) P, —(—1,5,3) and P, = (—2, 4, 7) 


Find a parametric representation for the line passing through the following 
points. 


2. (1, 1, — 1) and (—2, 1,3) 3. (—1, 5,2) and (3, —4, 1) 


4. Let P = (1,3, —1) and Q =(—4,5, 2). Determine the coordinates of the fol- 
lowing points: 
(a) The midpoint of the line segment between P and Q. 
(b) The two points on this line segment lying one-third and two-thirds of the 
way from P to Q. 
(c) The point lying one-fifth of the way from P to Q. 
(d) The point lying two-fifths of the way from P to Q. 


5. If P, Q are two arbitrary points in n-space, give the general formula for the 
midpoint of the line segment between P and Q. 


I, S6. Planes 


We can describe planes in 3-space by an equation analogous to the 
single equation of the line. We proceed as follows. 


Z 


Figure 32 


Let P be a point in 3-space and consider a located vector ON. We 
define the plane passing through P perpendicular to ON to be the col- 
lection of all points X such that the located vector PX is perpendicular 
to ON . According to our definitions, this amounts to the condition 


(X — P)-N — 0, 
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which can also be written as 


X-N=P-N. 


We shall also say that this plane is the one perpendicular to N, and 
consists of all vectors X such that X — P is perpendicular to N. We 
have drawn a typical situation in 3-spaces in Fig. 32. 

Instead of saying that N is perpendicular to the plane, one also says 
that N is normal to the plane. 

Let t be a number #0. Then the set of points X such that 


(X — P)-N =0 
coincides with the set of points X such that 
(X — P)-tN —0. 

Thus we may say that our plane is the plane passing through P and 


perpendicular to the line in the direction of N. To find the equation of 
the plane, we could use any vector tN (with t z 0) instead of N. 


Example 1. Let 
P-(2,1,—1) and N = (—1, 1, 3). 
Let X = (x,y,z). Then 
X-N=(-—1)x + y + 3z. 


Therefore the equation of the plane passing through P and perpendicular 
to N is 


mdi cud 


Or 
—x+y+3z= —4. 


Observe that in 2-space, with X = (x, y, the formulas lead to the 
equation of the line in the ordinary sense. 


Example 2. The equation of the line in the (x, y)-plane, passing 
through (4, —3) and perpendicular to (— 5, 2) is 


= 5x + 2y = 290 — 6 = —26. 
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We are now in position to interpret the coefficients ( — 5, 2) of x and y 
in this equation. They give rise to a vector perpendicular to the line. In 
any equation 


ax + by=c 
the vector (a,b) is perpendicular to the line determined by the equation. 


Similarly, in 3-space, the vector (a,b,c) is perpendicular to the plane 
determined by the equation 


ax + by + cz =d. 


Example 3. The plane determined by the equation 
2x—y+3z=5 


is perpendicular to the vector (2, — 1,3). If we want to find a point in 
that plane, we of course have many choices. We can give arbitrary val- 
ues to x and y, and then solve for z. To get a concrete point, let x — 1, 
y-1. Then we solve for z, namely 


dne gren 
so that z = $. Thus 


(1, 1, 3) 


is a point in the plane. 


In n-space, the equation X-N = P-N is said to be the equation of a 
hyperplane. For example, 


3x —-y+tz+2w=5 


is the equation of a hyperplane in 4-space, perpendicular to (3, —1, 1, 2). 
Two vectors A, B are said to be parallel if there exists a number c z 0 

such that cA = B. Two lines are said to be parallel if, given two distinct 

points P,, Q, on the first line and P,, Q, on the second, the vectors 


P,—Q, 


and 


P,—Q, 
are parallel. 
Two planes are said to be parallel (in 3-space) if their normal vectors 
are parallel. They are said to be perpendicular if their normal vectors are 


perpendicular. The angle between two planes is defined to be the angle 
between their normal vectors. 
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Example 4. Find the cosine of the angle 0 between the planes. 
2x — y+z=)0, 
x+2y—z=1. 
This cosine is the cosine of the angle between the vectors 
A = (2, —1, 1) and B = (1, 2, — 1). 
Therefore 


A-B 1 


cos ü = ————— = —, 
IAIL Bl 6 


Example 5. Let 


Q=(1,1,1) and  P-(1—1,2) 


Let 
N = (1, 2, 3) 


Find the point of intersection of the line through P in the direction of N, 
and the plane through Q perpendicular to N. 

The parametric representation of the line through P in the direction of 
N is 
(1) X=P+tN. 
The equation of the plane through Q perpendicular to N is 


(2) (X —Q):N =0. 


We visualize the line and plane as follows: 


Figure 33 
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We must find the value of t such that the vector X in (1) also satisfies 
(2), that is 


(P -tN —Q). N 20, 
or after using the rules of the dot product, 
(P—Q)-N+tN-N=0O. 


Solving for t yields 


 (Q-P)N | 
ENN 


Thus the desired point of intersection is 


P + tN = (1, =i 2) 3921,32; 3) = n m ia): 


Example 6. Find the equation of the plane passing through the three 
points 


P,=(1,2,-1).. P,=(—1,1,4, P,=(1,3, —2). 


We visualize schematically the three points as follows: 


P2 
P3 


Figure 34 


Then we find a vector N perpendicular to P,P, and P,P, , or in other 
words, perpendicular to P, — P, and P, — P,. We have 


Po] = eL + 5), 
P,— P, = (0,1, —1). 


Let N = (a,b,c). We must solve 


N-(P, — P,) =0 and N-(P,— P,) =9, 
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in other words, 


— 2a — b + 5c =Q, 
b—c=0. 


We take b = c = 1 and solve for a = 2. Then 
N=QC LH 


satisfies our requirements. The plane perpendicular to N, passing 
through P, is the desired plane. Its equation is therefore X - N = P,- N, 
that is 


2x+y+z=2+2-1=3. 


Distance between a point and a plane. Consider a plane defined by the 
equation 


(X — P)-N =O, 


and let Q be an arbitrary point. We wish to find a formula for the 
distance between Q and the plane. By this we mean the length of the 
segment from Q to the point of intersection of the perpendicular line to 
the plane through Q, as on the figure. We let Q’ be this point of inter- 
section. 


Figure 35 


From the geometry, we have: 


length of the segment OQ’ = length of the projection of OP on QQ’. 
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We can express the length of this projection in terms of the dot product 
as follows. A unit vector in the direction of N, which is perpendicular to 
the plane, is given by N/||N||. Then 


length of the projection of QP on QQ' 
— norm of the projection of Q — P on N/||N || 
N 
INT 


-|e - »- 


This can also be written in the form: 


(Q — P)-N| 
INI —— 


distance between Q and the plane — 


Example 7. Let 
Q = (1,3, 5), P =(-—1,1,7) and N =(—1,1, — 1). 
The equation of the plane is 
—x+y—z=—5. 
We find |N|| = ./3, 
Q — P = (2, 2, —2) and (Q—P).N—- —-24+24+2=2. 


Hence the distance between Q and the plane is 2/./3. 


Exercises I, §6 


1. Show that the lines 2x + 3y = 1 and 5x — 5y =7 are not perpendicular. 


2. Let y 2 mx+b and y=m'x +c be the equations of two lines in the plane. 
Write down vectors perpendicular to these lines. Show that these vectors are 
perpendicular to each other if and only if mm' = — 1. 


Find the equation of the line in 2-space, perpendicular to N and passing through 
P, for the following values of N and P. 


3 N=(1, —1), P-(—5,3) 4. N =(—5,4), P = (3, 2) 
5. Show that the lines 
3x — 5y = 1, 2x + 3y — 5 


are not perpendicular. 
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11. 


12. 


14. 


15. 


16. 


. Which of the following pairs of lines are perpendicular? 


(a) 3x —Sy=1 and 2x+ y=2 
(b 2x + 7y=1 and x-—y=5 
(c) 3x —Sy=1 and 5x + 3y=7 
(d) —x+y=2 andx+y=9 


. Find the equation of the plane perpendicular to the given vector N and 


passing through the given point P. 
(a) N =(1, —1, 3), P = (4,2, —1) 
(b N = (—3, —2,4), P = (2, x, —5) 
(c) N = (—1,0, 5), P = (2, 3,7) 


. Find the equation of the plane passing through the following three points. 


(a) (2,1, 1), 3, —1, 1), (4, 1, — 1) 
(b) (—2, 3, — 1), 22,3, (—4, —1, 1) 
(c) (—5, —1, 2), (1, 2, — 1), (3, — 1, 2) 


. Find a vector perpendicular to (1,2, —3) and (2, —1,3), and another vector 


perpendicular to (—1, 3,2) and (2, 1, 1). 


. Find a vector parallel to the line of intersection of the two planes 


2x—y+z=1, 3x + y+2=2. 
Same question for the planes, 
2xXx+y+5z=2, 3x —2y+2z=3. 


Find a parametric representation for the line of intersection of the planes of 
Exercises 10 and 11. 


. Find the cosine of the angle between the following planes: 


(a x+y+z=1 (b) 2x + 3y—z=2 
seys SS x— y+z=l 
(c) x+2y—z=1 (d) 2x+y+z=3 
—x+3y+z=2 —xX—yctzz-m 


(a) Let P=(1,3,5) and A = (—2, 1,1). Find the intersection of the line 
through P in the direction of A, and the plane 2x + 3y —z - 1. 
(b) Let P = (1,2, —1). Find the point of intersection of the plane 


3x —4y+z=2, 
with the line through P, perpendicular to that plane. 


Let Q = (1, —1,2), P = (1,3, -2) and N =(1,2,2). Find the point of the 
intersection of the line through P in the direction of N, and the plane 
through Q perpendicular to N. 


Find the distance between the indicated point and plane. 
(a) (1, 1,2) and 3x + y — 5z = 2 

(b) (—1,3, 2) and 2x —4y+z=1 

(c) (3, —2, 1) and the yz-plane 

(d) (—3, —2, 1) and the yz-plane 


CHAPTER Il 


Matrices and Linear 
Equations 


You have met linear equations in elementary school. Linear equations 
are simply equations like 


2x 4- y z=1, 
5x —y t S0. 


You have learned to solve such equations by the successive elimination 
of the variables. In this chapter, we shall review the theory of such 
equations, dealing with equations in n variables, and interpreting our 
results from the point of view of vectors. Several geometric interpreta- 
tions for the solutions of the equations will be given. 

The first chapter is used here very little, and can be entirely omitted if 
you know only the definition of the dot product between two n-tuples. 
The multiplication of matrices will be formulated in terms of such a 
product. One geometric interpretation for the solutions of homogeneous 
equations will however rely on the fact that the dot product between two 
vectors is O if and only if the vectors are perpendicular, so if you are 
interested in this interpretation, you should refer to the section in 
Chapter I where this is explained. 
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II, $1. Matrices 


We consider a new kind of object, matrices. 
Let n, m be two integers 2 1. An array of numbers 


Gi, 04,5 443 Ain 
Gy, 422 23 Ary 
Ami Am2 Am3 "^^ Am 


is called a matrix. We can abbreviate the notation for this matrix by 
writing it (a;j), i = 1,...,m and j = 1,...,n. We say that it is an m by n 
matrix, or an m x n matrix. The matrix has m rows and n columns. For 
instance, the first column is 


and the second row is (a5,,455,...,a5,). We call aj; the ij-entry or ij- 
component of the matrix. 

Look back at Chapter I, 81. The example of 7-space taken from eco- 
nomics gives rise to a 7 x 7 matrix (aj) (i,j = 1,...,7), if we define aj; to 
be the amount spent by the i-th industry on the j-th industry. Thus 
keeping the notation of that example, if a,, = 50, this means that the 
auto industry bought 50 million dollars worth of stuff from the chemical 
industry during the given year. 


Example 1. The following is a 2 x 3 matrix: 


1 1 —2 
—] 4 —5] 
It has two rows and three columns. 
The rows are (1, 1, —2) and (—1,4, —5). The columns are 


e) Q C3 


Thus the rows of a matrix may be viewed as n-tuples, and the columns 
may be viewed as vertical m-tuples. A vertical m-tuple is also called a 
column vector. 
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A vector (x,,...,x,) is a 1 x n matrix. A column vector 


X1 


is an n x 1 matrix. 
When we write a matrix in the form (aj;), then i denotes the row and 
j denotes the column. In Example 1, we have for instance 


41 — 1, 54 = — 5, 


A single number (a) may be viewed as a 1 x 1 matrix. 
Let (a,j) i — l,...,m and j= 1l,...,n be a matrix. If m = n, then we 
say that it is a square matrix. Thus 


1 —1 5 
4 and 2 1 —1 
3 1 —1 
are both square matrices. 


We define the zero matrix to be the matrix such that a;; = O for all 
i, j. It looks like this: 


0 0 0 ::- 0 
000.. 0 
0 0 0 0 


We shall write it O. We note that we have met so far with the zero 
number, zero vector, and zero matrix. 

We shall now define addition of matrices and multiplication of ma- 
trices by numbers. 

We define addition of matrices only when they have the same size. 
Thus let m, n be fixed integers 21. Let A = (ajj) and B = (bjj) be two 
m x n matrices. We define A + B to be the matrix whose entry in the 
i-th row and j-th column is aj; + bj. In other words, we add matrices of 
the same size componentwise. 


Example 2. Let 


Then 
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If A, B are both 1 x n matrices, ie. n-tuples, then we note that our 
addition of matrices coincides with the addition which we defined in 
Chapter I for n-tuples. 


If O is the zero matrix, then for any matrix A (of the same size, of 
course), we have O+ A=A+O2=A. 


This is trivially verified. We shall now define the multiplication of a 
matrix by a number. Let c be a number, and A — (ajj) be a matrix. We 
define cA to be the matrix whose ij-component is caj. We write 


cA == (ca;;). 


Thus we multiply each component of A by c. 


Example 3. Let A, B be as in Example 2. Let c = 2. Then 


2 —2 0 10 2 —2 
a 6 4 and 5-5 5 3) 


We also have 


Cas A= (1 i) 
Uu. ed 


In general, for any matrix A = (aj) we let — A (minus A) be the matrix 
(—aj;j. Since we have the relation a;; — aj; = 0 for numbers, we also get 
the relation 


A+(—A)=0 


for matrices. The matrix — A is also called the additive inverse of A. 

We define one more notion related to a matrix. Let A = (ajj) be an 
m x n matrix. The n x m matrix B = (bj) such that bj; = aj; is called the 
transpose of A, and is also denoted by ‘A. Taking the transpose of a 
matrix amounts to changing rows into columns and vice versa. If A is 
the matrix which we wrote down at the beginning of this section, then 'A 
is the matrix 
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To take a special case: 


2.4 
(-( : d then 'A-2|1 3| 
0 5 


If A = (2, 1, —4) is a row vector, then 


‘A=| 1 
—4 


is a column vector. 
A matrix A which is equal to its transpose, that is A — 'A, is called 
symmetric. Such a matrix is necessarily a square matrix. 


Remark on notation. I have written the transpose sign on the left, 
because in many situations one considers the inverse of a matrix written 
A^ !, and then it is easier to write 'A ^! rather than (A^!) or (A) !, 
which are in fact equal. The mathematical community has no consensus 
as to where the transpose sign should be placed, on the right or left. 


Exercises II, $1 


"ZNM M T EIE 
“Ke 0o 2) Hw SEXES 


1. Let 


2. Let 


Find A + B, 3B, —2B, A+ 2B, A — B, B — A. 


3. (a) Write down the row vectors and column vectors of the matrices A, B in 
Exercise 1. 

(b) Write down the row vectors and column vectors of the matrices A, B in 
Exercise 2. 


4. (a) In Exercise 1, find 'A and 'B. 
(b) In Exercise 2, find ʻA and 'B. 


5. If A, B are arbitrary m x n matrices, show that 


“(A + B) ='A + 'B. 
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6. If c is a number, show that '(cA) = c'A. 


7. If A = (aj) is a square matrix, then the elements a; are called the diagonal 
elements. How do the diagonal elements of A and 'A differ? 


8. Find ‘(A + B) and 'A + ‘B in Exercise 2. 
9. Find A +‘'A and B +'B in Exercise 2. 


10. (a) Show that for any square matrix, the matrix A + ‘A is symmetric. 
(b) Define a matrix A to be skew-symmetric if ‘Ad = — A. Show that for any 
square matrix A, the matrix A —'A is skew-symmetric. 
(c) If a matrix is skew-symmetric, what can you say about its diagonal ele- 
ments? 
11. Let 


E, =(1, 0,...,0), E,=(0,1,0,...,0), ..., E, —(0,....0, 1) 


be the standard unit vectors of R". Let x,,...,x, be numbers. What is 
x,E, +---+x,E,? Show that if 


x,E, +- + x,EQ.20 


then x; = 0 for all i. 


II, §2. Multiplication of Matrices 


We shall now define the product of matrices. Let A = (ajj), i= 1,...,m 
and j= L,...,n be an m x n matrix. Let B = (by), j — 1,...,n and let 
= ],...,s bean n x s matrix: 


n Dip 39 bis 
Ami eee Amn bat eee b 


We define the product AB to be the m x s matrix whose ik-coordinate is 


n 
y dibg = aj b, + ais boy t o + au by. 
j=1 


If A,,...,4,, are the row vectors of the matrix A, and if B!,...,B* are the 
column vectors of the matrix B, then the ik-coordinate of the product 
AB is equal to A;- B*. Thus 
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Multiplication of matrices is therefore a generalization of the dot 
product. 


Example. Let 


4 
2 1 5 15 15 
4B =( “3 :( ) 
1 3 2 4 12 
2 1 
Example. Let 


um 1 3 
ae ee) aes 


Let A, B be as in Example 1. Then 


3 4 — | 5 
1 3 
BC = | —1 i 1 J^ —3 —5 
2 1 ] 5 
and 
1 5 


xem db poo well © 90 

Vero. X JW 4-8 os 
1 5 

Compute (AB)C. What do you find? 


If X = (x,,...,x,) is a row vector, ie. a 1 x m matrix, then we can 
form the product XA, which looks like this: 


Qi, 70 Gin 
(Xirsi Xm): © LIU? 
Ant 0c mn 
where 


Vig = XQ FO TF Xp ake 


In this case, XA is a 1 x n matrix, Le. a row vector. 
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On the other hand, if X is a column vector, 


then AX = Y where Y is also a column vector, whose coordinates are 
given by 


Mie x djjXj; — di, X4 Tes dig X,. 
j=l 
Visually, the multiplication AX = Y looks like 


Qi, "°° GAin\ [X1 Yı 


a n Ym 


mi 


Example. Linear equations. Matrices give a convenient way of writing 
linear equations. You should already have considered systems of linear 
equations. For instance, one equation like: 


3x —2y + 3z = 1, 


with three unknowns x, y, z. Or a system of two equations in three 
unknowns 


3x — 2y + 3z = 1, 


) LE ee 


In this example we let the matrix of coefficients be 


P 3. ex 3 
"ONSE 7 —4Jf 


Let B be the column vector of the numbers appearing on the right-hand 


side, so 
J 
B = l 
—5 


Let the vector of unknowns be the column vector. 


50 MATRICES AND LINEAR EQUATIONS [II, 82] 


Then you can see that the system of two simultaneous equations can be 
written in the form 


AX = B. 


Example. The first equation of (*) represents equality of the first 
component of AX and B; whereas the second equation of (*) represents 
equality of the second component of AX and B. 


In general, let A — (aj) be an m x n matrix, and let B be a column 
vector of size m. Let 


X1 


X2 


X 


n 


be a column vector of size n. Then the system of linear equations 


Q11X4, t ct cb da XQ = bi, 


d31X, tes + d3,X, = ba, 
ümiXi T c7 + Amn Xn = bm» 


can be written in the more efficient way 


by the definition of multiplication of matrices. We shall see later how to 
solve such systems. We say that there are m equations and n unknowns, 
or n variables. 


Example. Markov matrices. A matrix can often be used to represent 
a practical situation. Suppose we deal with three cities, say Los Angeles, 
Chicago, and Boston, denoted by LA, Ch, and Bo. Suppose that any 
given year, some people leave each one of these cities to go to one of the 
others. The peicentages of people leaving and going is given as follows, 
for each year. 


LA goes to Bo and + LA goes to Ch. 
Ch goes to LA and 


B 


+ Ch goes to Bo. 
Bo goes to LA and . i Bo goes to Ch. 


Ol ne 
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Let x,, y,, z, be the populations of LA, Ch, and Bo, respectively, in the 
n-th year. Then we can express the population in the (n + 1)-th year as 
follows. 


In the (n + 1)-th year, 1 of the LA population leaves for Boston, and 
+ leaves for Chicago. The total fraction leaving LA during the year is 
therefore 


a eee ce 
4+ 7 = 28: 


Hence the total fraction remaining in LA is 


Similarly the fraction leaving Chicago each year is 


ee eee 
5b. T5 


so the fraction remaining is 7%. Finally, the fraction leaving Boston each 
year is 


1 T cd 
6 E 8 T 24> 
so the fraction remaining in Boston is 37. Thus 


— 1 zy. 1 
Yn+1 ES Xn + 15Yn + 84n» 


1 1 17 
Zn+1 = 4*n E 3 Yn EE 242n: 


Let A be the matrix 


17 1 1 

28: 5 6 

ef b — 07 fb 
dee 7 15 8 
i 1 17 

4 24 


Then we can write down more simply the population shift by the expres- 
sion 


X pe be where X 


| 
= 
x 


n 
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The change from X, to X,,, is called a Markov process. This is due to 
the special property of the matrix A, all of whose components are 2 0, 
and such that the sum of all the elements in each column is equal to 1. 
Such a matrix is called a Markov matrix. 


If A is a square matrix, then we can form the product AA, which will 
be a square matrix of the same size as A. It is denoted by A?. Similarly, 
we can form A?, A^, and in general, A" for any positive integer n. Thus 
A" is the product of A with itself n times. 

We can define the unit n x n matrix to be the matrix having diagonal 
components all equal to 1, and all other components equal to 0. Thus 
the unit n x n matrix, denoted by I,, looks like this: 


1.00... 0 
010.. 0 
0.0 1] 0 
l|. . V |. 
000 10 
0.0 0.- I 


We can then define A9 — I (the unit matrix of the same size as A). Note 
that for any two integers r, s Z 0 we have the usual relation 


A’ AS = ASA’ = A655, 


For example, in the Markov process described above, we may express 
the population vector in the (n + 1)-th year as 


P OE = A" X i, 


where X, is the population vector in the first year. 


Warning. It is not always true that AB — BA. For instance, compute 
AB and BA in the following cases: 


3 2 2 —1 
(oi) n s 
You will find two different values. This is expressed by saying that mul- 
tiplication of matrices is not necessarily commutative. Of course, in some 


special cases, we do have AB — BA. For instance, powers of A commute, 
ie. we have A' 4? = A'A" as already pointed out above. 


We now prove other basic properties of multiplication. 
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Distributive law. Let A, B, C be matrices. Assume that A, B can be 
multiplied, and A, C can be multiplied, and B, C can be added. Then A, 
B+ C can be multiplied, and we have 


A(B + C) = AB + AC. 
If x is a number, then 
A(xB) = x(AB). 


Proof. Let A; be the i-th row of A and let B*, C* be the k-th column 
of B and C, respectively.... Then B* + C* is the k-th column of B+ C. 
By definition, the ik-component of A(B + C) is A,-(B* + C*). Since 


A, (B* + C*) = A,- B* + A,-C*, 


our first assertion follows. As for the second, observe that the k-th 
column of xB is xB*. Since 


our second assertion follows. 


Associative law. Let A, B, C be matrices such that A, B can be multi- 
plied and B, C can be multiplied. Then A, BC can be multiplied. So 
can AB, C, and we have 


(AB)C = A(BC). 


Proof. Let A = (aj) be an m x n matrix, let B= (b) bean nxr 
matrix, and let C = (c,,) be an r x s matrix. The product AB is an m xr 
matrix, whose ik-component is equal to the sum 


dj bi + Aig Day t o + aba. 


We shall abbreviate this sum using our 5, notation by writing 


By definition, the il-component of (AB)C is equal to 


r 


2. | ne au feu = > | > ayben | 
1| j=1 1 


k= k=1 


J= 


54 MATRICES AND LINEAR EQUATIONS [ II, $2] 


The sum on the right can also be described as the sum of all terms 


È aibi Cy, 


where j, k range over all integers 1 <j € n and 1 € k <r, respectively. 

If we had started with the jl-component of BC and then computed the 
il-component of A(BC) we would have found exactly the same sum, 
thereby proving the desired property. 


The above properties are very similar to those of multiplication of 
numbers, except that the commutative law does not hold. 
We can also relate multiplication with the transpose: 


Let A, B be matrices of a size such that AB is defined. Then 
(AB) 2 ' B'A. 


In other words, the transpose of the product is equal to the product of 
the transpose in reverse order. 


Proof. Let A = (aj) and B = (by). Then AB = C = (c) where 


Cik = di bay + abu 


= baa ot bua. 
Let ‘A = (a;i), 'B = (b,j), and 'C = (cj). Then 
d d kj = Dj Ch; = On. 
Hence we can reread the above relation as 
Cy; = R144; to + Din Anis 
which shows that 'C = 'B'A, as desired. 


Example. Instead of writing the system of linear equations AX = B in 
terms of column vectors, we can write it by taking the transpose, which 
gives 


'X'A = 'B. 


If X, B are column vectors, then 'X, 'B are row vectors. It is occasion- 
ally convenient to rewrite the system in this fashion. 


Unlike division with non-zero numbers, we cannot divide by a matrix, 
any more than we could divide by a vector (n-tuple). Under certain 
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circumstances, we can define an inverse as follows. We do this only for 
square matrices. Let A be an n x n matrix. An inverse for A is a matrix 
B such that 

AB — BA - I. 
Since we multiplied A with B on both sides, the only way this can make 
sense is if B is also an n x n matrix. Some matrices do not have in- 
verses. However, if an inverse exists, then there is only one (we say that 
the inverse is unique, or uniquely determined by A). This is easy to prove. 
Suppose that B, C are inverses, so we have 

AB = BA-I and AC —CAÀ--I. 

Multiply the equation BA — I on the right with C. Then 

BAC=IC=C 


and we have assumed that AC = I, so BAC = BI = B. This proves that 
B — C. In light of this, the inverse is denoted by 


A`}. 


Then A`! is the unique matrix such that 


We shall prove later that if A, B are square matrices of the same size 
such that AB — I then it follows that also 


BA =I. 


In other words, if B is a right inverse for A, then it is also a left inverse. 
You may assume this for the time being. Thus in verifying that a 
matrix is the inverse of another, you need only do so on one side. 

We shall also find later a way of computing the inverse when it exists. 
It can be a tedious matter. 


Let c be a number. Then the matrix 
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having component c on each diagonal entry and O otherwise is called a 
scalar matrix. We can also write it as cl, where I is the unit n x n 
matrix. Cf. Exercise 6. 

As an application of the formula for the transpose of a product, we 
shall now see that: 


The transpose of an inverse is the inverse of the transpose, that is 
(A71) = (A) |. 


Proof. Take the transpose of the relation AA~' = I. Then by the rule 
for the transpose of a product, we get 


(A !yA-'I-I 


because I is equal to its own transpose. Similarly, applying the transpose 
to the relation A !A- I yields 


'A (A71) =] =], 
Hence ‘(A~*) is an inverse for 'A, as was to be shown. 


In light of this result, it is customary to omit the parentheses, and to 
write 


tA-1 


for the inverse of the transpose, which we have seen is equal to the 
transpose of the inverse. 


We end this section with an important example of multiplication of 
matrices. 


Example. Rotations. A special type of 2 x 2 matrix represents rota- 
tions. For each number 06, let R(0) be the matrix 


R(0) = Ps 0 -—sin A 


sin 0 cos 0] 


X ; aa "HM 
Let X — | be a point on the unit circle. We may write its coordin- 
y 


ates x, y in the form 


X — COS Q, y=sin Q 
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for some number qo. Then we get, by matrix multiplication: 
R(0) XY oy —sin 0 Pos 
y sin 0 cos 0 Asin @ 
 («cos(0 + ọ) 
— Nsin(0 + 9) J 
This follows from the addition formula for sine and cosine, namely 


cos(0 + o) = cos 0 cos o — sin 0 sin g, 
sin( + ~) = sin 0 cos o + cos 0 sin o. 


An arbitrary point in R? can be written in the form 


PN ( cos 4 
r sin Q 
where r is a number = 0. Since 
R(0)rX = rR(0)X, 
we see that multiplication by R(0) also has the effect of rotating rX by 


an angle 0. Thus rotation by an angle 0 can be represented by the 
matrix R(@). 


R(0)X = '(cos(0 + o), sin(0 + ọ)) 


— 


Figure 1 


Note that for typographical reasons, we have written the vector 'X 
horizontally, but have put a little £ on the upper left superscript, to 
denote transpose, so X is a column vector. 
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Example. The matrix corresponding to rotation by an angle of 7/3 is 
given by 


R(n/3) = 


-(^ pos 
"Aam- wo 


Example. Let X = (2, 5). If you rotate X by an angle of z/3, find the 
coordinates of the rotated vector. 
These coordinates are: 


1/2 Eos d 
R(n/3)X = 
(n/3)X ( die 34 A4 


aed 


Warning. Note how we multiply the column vector on the left with 
the matrix R(0). If you want to work with row vectors, then take the 
transpose and verify directly that 


cos t/3  —sin 7/3 
sin 7/3 cos 7/3 


1/2 3n 
2,9 —(1—543/2, JJ 3 + 5/2) 
es ( an a (1 — 5,/3/2, S2 


So the matrix R(0) gets transposed. The minus sign is now in the lower 
left-hand corner. 


Exercises II, §2 


The following exercises give mostly routine practice in the multiplication of ma- 
trices. However, they also illustrate some more theoretical aspects of this multip- 
lication. Therefore they should be all worked out. Specifically: 

Exercises 7 through 12 illustrate multiplication by the standard unit vectors. 

Exercises 14 through 19 illustrate multiplication of triangular matrices. 

Exercises 24 through 27 illustrate how addition of numbers is transformed 
into multiplication of matrices. 

Exercises 27 through 32 illustrate rotations. 

Exercises 33 through 37 illustrate elementary matrices, and should be worked 
out before studying 85. 


1. Let J be the unit n x n matrix. Let A be an n x r matrix. What is 1A? If A 
is an m x n matrix, what is AI? 


2. Let O be the matrix all of whose coordinates are 0. Let A be a matrix of a 
size such that the product AO is defined. What is AO? 
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3. In each one of the following cases, find (AB)C and A(BC). 


pee dee qe de c p os 
d oes oO" FG 


ya : M 0 eua 
ix EM xcd 


o2 


1 1 0 1 2 
Gael 7 Cy gel reike o3 4 
uu NE T" x dox 

3 1 5 -1 4 


4. Let A, B be square matrices of the same size, and assume that AB = BA. 
Show that 


(A + B} = A? +2AB+ BP, and (A+ BXA- B)= A? — B?, 


using the distributive law. 


5. Let 
y 1 2 us 2 0 
Cu. E (M 17 
Find AB and BA. 
6. Let 


Let A, B be as in Exercise 5. Find CA, AC, CB, and BC. State the general 
rule including this exercise as a special case. 


7. Let X = (1,0,0) and let 


3 1 5 
A -[2 0 1 
1 1 1 


What is XA? 


8. Let X = (0, 1,0), and let A be an arbitrary 3 x 3 matrix. How would you 
describe XA? What if X = (0,0,1)? Generalize to similar statements con- 
cerning n x n matrices, and their products with unit vectors. 


9. Let 


Find AX for each of the following values of X. 
1 0 0 


(a) X 2[0 (b X ={1 (c) X 2410 
0 1 l 
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11. 


12. 


13. 


16. 
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. Let 
3 7 5 
A-—-|I1 -1 4 |. 
2 1 8 


Find AX for each of the values of X given in Exercise 9. 


Let 


O o- Q 


What is AX? 


Let X be a column vector having all its components equal to 0 except the 
j-th component which is equal to 1. Let A be an arbitrary matrix, whose size 
is such that we can form the product AX. What is AX? 


Let X be the indicated column vector, and A the indicated matrix. Find AX 
as a column vector. 


3 1 o0 4 l "ET 
(a) X={2), A-[2. 1 1 (b) X — EL : j 

D- Ai cedi 0 

^1 0 1! 0 ‘1 0 0 0 
(c) X =] x, b dd" 0 o) (d) X2|x;] 4-«( 0 )) 

X3 X3 


b 
. Let A= ( ay Find the product AS for each one of the following ma- 


trices S. Describe in words the effect on A of this product. 


"T. 1 x b) S= 1 0 
os( D ee( o) 


b 
. Let Az ( i again. Find the product SA for each one of the following 
C 


matrices S. Describe in words the effect of this product on A. 


a) S= 1 x "T 1 0 
(a) =( 4 (b) -( ) 


(a) Let A be the matrix 


0 1 l 
0 0 1 
0 0 0 


Find A?, A?. Generalize to 4 x 4 matrices. 
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17. 


19. 


20. 


2L 


22: 


23. 


(b) Let A be the matrix 


1 1 1 

0 1 1 

0 0 1 

Compute A?, A?, A*. 
Let 

1 0 0 
A-2140 2 0 
0 0 3 


Find A?, A?, A*. 


. Let A be a diagonal matrix, with diagonal elements a,,...,a,. What is A’, 


A>, A* for any positive integer k? 
y Pp 


Let 
0 1 6 
A -|0 0 4 
0 0 0 
Find A? 
—1 0 
(a) Find a 2 x 2 matrix A such that 4? = —I = 0 a) 


(b) Determine all 2 x 2 matrices A such that A? — O. 


Let A be a square matrix. 

(a) If A? =O show that I — A is invertible. 

(b) If 4? = O, show that I — A is invertible. 

(c) In general, if 4A" — O for some positive integer n, show that I — A is 
invertible. [Hint: Think of the geometric series. ] 

(d) Suppose that A? + 24 +1 =O. Show that A is invertible. 

(e) Suppose that 4? — A +1=0O. Show that A is invertible. 


Let A, B be two square matrices of the same size. We say that A is similar 

to B if there exists an invertible matrix T such that B= TAT !. Suppose 

this is the case. Prove: 

(a) B is similar to A. 

(b) A is invertible if and only if B is invertible. 

(c) ‘A is similar to 'B. 

(d) Suppose A" 2 O and B is an invertible matrix of the same size as A. 
Show that (BAB !)' = O. 


Let A be a square matrix which is of the form 


dii * * * * * ee * 
0 55 * * 
* 
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The notation means that all elements below the diagonal are equal to 0, 
and the elements above the diagonal are arbitrary. One may express this 
property by saying that 


a;, = 0 if i>j. 


Such a matrix is called upper triangular. If A, B are upper triangular 
matrices (of the same size) what can you say about the diagonal elements of 
AB? 


Exercises 24 through 27 give examples where addition of numbers is trans- 
formed into multiplication of matrices. 


24. Let a, b be numbers, and let 


What is AB? What is A?, A?? What is A" where n is a positive integer? 
25. Show that the matrix A in Exercise 24 has an inverse. What is this inverse? 


26. Show that if A, B are n x n matrices which have inverses, then AB has an 
inverse. 


27. Rotations. Let R(0) be the matrix given by 


cos —sin 0 
R(0) = | ) 


sin 0 cos 0 
(a) Show that for any two numbers 0,, 0, we have 
R(0,)R(0;) = RO, + 03). 


[You will have to use the addition formulas for sine and cosine.] 
(b) Show that the matrix R(0) has an inverse, and write down this inverse. 
(c) Let A — R(0). Show that 


pe cos 20 —sin 20 
— \ sin 20 cos 20 J 
(d) Determine A" for any positive integer n. Use induction. 


28. Find the matrix R(0) associated with the rotation for each of the following 
values of 0. 
(a) 7/2 (b) u/A (xmz ((d-nz (e) -7/3 
(f) n/6 = (g) 5n/4 


29. In general, let 0 > 0. What is the matrix associated with the rotation by an 
angle —@ (i.e. clockwise rotation by 0)? 


[IT, $2] 


30. 


3l. 
32. 
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Let X — (1, 2) be a point of the plane. If you rotate X by an angle of 7/4, 
what are the coordinates of the new point? 


Same question when X = '(— 1,3) and the rotation is by an angle of z/2. 


For any vector X in R? let Y= R(0)X be its rotation by an angle 0. Show 


that || Y] = [X ||. 


The following exercises on elementary matrices should be done before study- 
ing $5. 


33. Elementary matrices. Let 


34. 


Let U be the matrix as shown. In each case find U A. 


(a) 


O o o © 


(c) 


O C & © 


(e) 


O o o o 


O o o d 


O — O O&O 


O o o o 


O o © © O o o o 


- O o o 


O o o o O o o o 


O o o o 


p 
N=- Aa U 


0 0 
| 0 

b 

0160 p 
0 0 
0 0 

«| 9 0 
0 0 
0 0 
0 0 
0 0 

f 

(Do o 
0 0 


Let E be the matrix as shown. Find EA 
the preceding exercise. 


(a) 


O o =- oO 


(c) 


O oOo O deme 


O oOo oO = 


nan oe oO 


O =- Oo oc 


O -- o o 


— Oo o © 


— O o c 


1 0 
0 0 

b 

Olo a 
0 0 
1 0 

(d) 0 1 
0 —2 
0 0 


O Om © O c o o 


O o o © 


O =- c 


where A is the same matrix as in 


O oO =- o 


O =- Oo o 


=- o0 © 


mM O © © 
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35. 


36. 


3d. 


II, 


In 
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Let E be the matrix as shown. Find EA where A is the same matrix as in 
the preceding exercise and Exercise 33. 
3 0 0 0 1 0 3 0 
(a) 0 1 0 0 (b) 0 1 0 0 
a 
0 0 1 0 0 0 1 0 
0 0 0 1 0 0 0 1 
1 0 0 0 1 0 0 0 
(c) —2 1 0 0 (d) 0 1 0 0 
C 
0 0 1 0 0 —2 ] 0 
0 0 0 1 0 0 0 1 
Let A = (aj) be an m x n matrix, 
ay, Qin 
a a 


Let | Er m and 1 E s E m. Let I,, be the matrix whose rs-component is 1 

and such that all other components are equal to 0. 

(a) What is 1,, A? 

(b) Suppose r # s. What is (1,, + I,,)A? 

(c) Suppose r z s. Let I; be the matrix whose jj-component is 1 and such 
that all other components are 0. Let 


E, —l,--I,--sum of all Ij; for jzr, js. 


What is E,, A? 


Again let r z s. 
(a) Let E = I + 3I. What is EA? 
(b) Let c be any number. Let E = I + cl,,. What is EA? 


The rest of the chapter will be mostly concerned with linear equations, 
and especially homogeneous ones. We shall find three ways of interpret- 


ing such equations, illustrating three dfferent ways of thinking about 
matrices and vectors. 


$3. Homogeneous Linear Equations and Elimination 


this section, we look at linear equations by one method of elimina- 


tion. In the next section, we shall discuss another method. 


We shall be interested in the case when the number of unknowns is 


greater than the number of equations, and we shall see that in that case, 
there always exists a non-trivial solution. 


Before dealing with the general case, we shall study examples. 
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Example 1. Suppose that we have a single equation, like 
2x+ y—4z=0. 


We wish to find a solution with not all of x, y, z equal to 0. An 
equivalent equation is 
2x = — y + 4z. 
To find a non-trivial solution, we give all the variables except the first a 
special value # 0, say y = 1, z = 1. We than solve for x. We find 
2x= —y+ 4z =3, 


whence x = 3. 

Example 2. Consider a pair of equations, say 
(1) 2x + 3y —z=0, 
(2) x+ y+z=0. 


We redute the problem of solving these simultaneous equations to the 
preceding case of one equation, by eliminating one variable. Thus we 
multiply the second equation by 2 and subtract it from the first equa- 
tion, getting 


(3) y—3z=0. 


Now we meet one equation in more than one variable. We give z any 
value #0, say z = 1, and solve for y, namely y = 3. We then solve for x 
from the second equation, namely x = — y — z, and obtain x = —4. The 
values which we have obtained for x, y, z are also solutions of the first 
equation, because the first equation is (in an obvious sense) the sum of 
equation (2) multiplied by 2, and equation (3). 


Example 3. We wish to find a solution for the system of equations 


3x —2y - z - 2w — 0, 
x+y—z—w=QJ), 
2x — 2y + 3z=0. 


Again we use the elimination method. Multiply the second equation by 
2 and subtract it from the third. We find 


—4y + 5z + 2w « O. 
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Multiply the second equation by 3 and subtract it from the first. We 
find 


— 5y + 4z+5w=0. 
We have now eliminated x from our equations, and find two equations 
in three unknowns, y, z, w. We eliminate y from these two equations as 


follows: Multiply the top one by 5, multiply the bottom one by 4, and 
subtract them. We get 


97 — 10w = Q. 


Now give an arbitrary value # 0 to w, say w = 1. Then we can solve for 
z, namely 


z — 10/9. 
Going back to the equations before that, we solve for y, using 


4y = 5z + 2w. 
This yields 
yz 17/9. 
Finally we solve for x using say the second of the original set of three 
equations, so that 
X=-ytZt+w, 
or numerically, 


x = —49/9. 
Thus we have found: 
w — 1l, z — 10/9, y = 68/9, x = —49/9. 


Note that we had three equations in four unknowns. By a successive 
elimination of variables, we reduced these equations to two equations in 
three unknowns, and then one equation in two unknowns. 


Using precisely the same method, suppose that we start with three 
equations in five unknowns. Eliminating one variable will yield two 
equations in four unknowns. Eliminating another variable will yield one 
equation in three unknowns. We can then solve this equation, and pro- 
ceed backwards to get values for the previous variables just as we have 
shown in the examples. 
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In general, suppose that we start with m equations with n unknowns, 
and n» m. We eliminate one of the variables, say x,, and obtain a 
system of m — 1 equations in n — 1 unknowns. We eliminate a second 
variable, say x,, and obtain a system of m — 2 equations in n — 2 un- 
knowns. Proceeding stepwise, we eliminate m — 1 variables, ending up 
with 1 equation in n — m + 1 unknowns. We then give non-trivial arbi- 
trary values to all the remaining variables but one, solve for this last 
variable, and then proceed backwards to solve successively for each one 
of the eliminated variables as we did in our examples. Thus we have an 
effective way of finding a non-trivial solution for the original system. 

We shall phrase this in terms of induction in a precise manner. 

Let A = (aj), i= l,...,m and j = 1,...,n be a matrix. Let b,,...,b,, be 
numbers. Equations like 


dX bob dauXS = by 
(*) 


Amı Xı Pere Amn Xn = bn 


are called linear equations. We also say that (*) is a system of linear 
equations. The system is said to be homogeneous if all the numbers 
b,,...,b, are equal to 0. The number n is called the number of un- 
knowns, and m is the number of equations. 

The system of equations 


411:X, +... + 4,,X, = 0 
(s) : . 


Ami X1 +++. + dy X, = O 


will be called the homogeneous system associated with (*). In this section, 
we study the homogeneous system (xx). 

The system (**) always has a solution, namely the solution obtained 
by letting all x; 2 0. This solution will be called the trivial solution. A 
solution (x,,...,x,) such that some x; is # 0 is called non-trivial. 

Consider our system of homogeneous equations (**). Let A,,...,A,, 
be the row vectors of the matrix (q;). Then we can rewrite our equa- 
tions (*«) in the form 


A -X =0 
(**) ; 
A,-X — 0. 


Therefore a solution of the system of linear equations can be interpreted 
as the set of all n-tuples X which are perpendicular to the row vectors of 
the matrix A. Geometrically, to find a solution of (**) amounts to find- 
ing a vector X which is perpendicular to A,,...,A,. Using the notation 
of the dot product will make it easier to formulate the proof of our main 
theorem, namely: 
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Theorem 3.1. Let 


di,Xi bos + aux, =O 
(x) i : 


Ami] ian rn Xn D 0 


be a system of m linear equations in n unknowns, and assume that 
nm. Then the system has a non-trivial solution. 


Proof. The proof will be carried out by induction. 
Consider first the case of one equation in n unknowns, n> 1: 
4X4 + tai! + a,Xn = 0. 


If all coefficients a,,...,a, are equal to 0, then any value of the variables 
will be a solution, and a non-trivial solution certainly exists. Suppose 
that some coefficient a; is z 0. After renumbering the variables and the 


coefficients, we may assume that it is a}. Then we give x,,...,x, arbi- 
trary values, for instance we let x, =---= x, — 1l, and solve for x,, let- 
ting 


5 E (a5 +: +a). 
a, 
In that manner, we obtain a non-trivial solution for our system of equa- 
tions. 

Let us now assume that our theorem is true for a system of m — 1 
equations in more than m — 1 unknowns. We shall prove that it is true 
for m equations in n unknowns when n >m. We consider the system 
(x). 

If all coefficients (a;;) are equal to 0, we can give any non-zero value 
to our variables to get a solution. If some coefficient is not equal to 0, 
then after renumbering the equations and the variables, we may assume 
that it is a,,. We shall subtract a multiple of the first equation from the 
others to eliminate x,. Namely, we consider the system of equations 


(4. ao Ay)-X = 
Ay 


(m ,) x = 0, 
dii 


which can also be written in the form 


a 
A,-X ——. A,.X 20 
011 


(xxx) 


ns 
A. X = Age Xx =O, 


11 
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In this system, the coefficient of x, is equal to 0. Hence we may view 
(xxx) as a system of m — 1 equations in n — 1 unknowns, and we have 
n—1-»m-t. 

According to our assumption, we can find a non-trivial solution 
(xX5,...,x,) for this system. We can then solve for x, in the first equa- 
tion, namely 


X, — —— (d415X5 t c: + yy X). 
11 


In that way, we find a solution of A,- X — 0. But according to (***), we 


have 


+X = A,:X 
dii 


for i 2 2,...,m. Hence A;- X = 0 for i= 2,...,m, and therefore we have 
found a non-trivial solution to our original system (sx). 

The argument we have just given allows us to proceed stepwise from 
one equation to two equations, then from two to three, and so forth. 
This concludes the proof. 


Exercises II, $3 
1. Let 
E, =(1,0,...,0), E, 2(0,1,0,...0, ..., E, =(0,...,0, 1) 


be the standard unit vectors of R”. Let X be an n-tuple. If X - E; — 0 for all i, 
show that X = O. 


2. Let A,,...,A,, be vectors in R". Let X, Y be solutions of the system of equa- 
tions 


X-:A4; 20 and Y- 4; 20 for i=1,...,m. 


Show that X + Y is also a solution. If c is a number, show that cX is a 
solution. 


3. In Exercise 2, suppose that X is perpendicular to each one of the vectors 


A,,..., A,. Let c,,...,c, be numbers. A vector 
cA; + dd + C. 
is called a linear combination of A,,...,4,. Show that X is perpendicular to 


such a vector. 


70 MATRICES AND LINEAR EQUATIONS [IT, $4] 


4. Consider the inhomogeneous system (x) consisting of all X such that X - A; = 
b; for i — 1,...,m. If X and X’ are two solutions of this system, show that 
there exists a solution Y of the homogeneous system (**) such that X'— 
X + Y. Conversely, if X is any solution of (x), and Y a solution of (**), show 
that X + Y is a solution of (x). 


5. Find at least one non-trivial solution for each one of the following systems of 
equations. Since there are many choices involved, we don't give answers. 


(a) 3x + y+z=0 (b) 3x + y+z=0 
x+y+z=0 
(c) 2x — 3y - Az 2 0 (d) 2x+y+4z+w=0 
3x +y+z=0 —3x+2y—3z+w=0 
x+y+z=0 
(e) —x+2y—4z+w=0 (f) —2x+3y+z2+4w=0 
x+3y+z—-w=0 x+y+2z+3w=0 


2x+y+z2—2w=0 


6. Show that the only solutions of the following systems of equations are trivial. 


(a) 2x + 3y 20 (b 4x+5y=0 
x—y=0 —6x+ 7y=0 
(c) 3x+4y—2z=0 (d) 4x — 7y + 32 20 
x+y+z=0 x+y=0 
—x—3y+5z=0 y—6z=0 
(e) 7x — 2y - 5z - w 20 (f —3x+y+z=0 
x—y+z=0 X—yctz—2w-0 
y—2z+w=0 x—-Z+w=0 
X+Z+w=0 —x+y—3w=0 


IL, $4. Row Operations and Gauss Elimination 


Consider the system of linear equations 


3x —2y+ z4+2w=1, 
x+ y— Z— w= -2, 
2x— y+ 3z = 4. 


The matrix of coefficients is 
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By the augmented matrix we shall mean the matrix obtained by inserting 
the column 


1 
—2 
4 


as a last column, so the augmented matrix is 


3 —2 ] 2 l 
| 1 -1 -1 -2 
2 —1 3 0 4 


In general, let AX — B be a system of m linear equations in n un- 
knowns, which we write in full: 


diiXQ boc + aux = b, 
d51X4 ^ + do, X, = b2, 
GmiX1 + i T Amn Xn == bc 


Then we define the augmented matrix to be the m by n+ 1 matrix: 


di 012 Qin b, 
Ay; 422 Ay, b) 
Amı Am2 d mn b 


In the examples of homogeneous linear equations of the preceding 
section, you will notice that we performed the following operations, 
called elementary row operations: 


Multiply one equation by a non-zero number. 
Add one equation to another. 
Interchange two equations. 


These operations are reflected in operations on the augmented matrix of 
coefficients, which are also called elementary row operations: 


Multiply one row by a non-zero number. 
Add one row to another. 
Interchange two rows. 


Suppose that a system of linear equations is changed by an elemen- 
tary row operation. Then the solutions of the new system are exactly the 
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same as the solutions of the old system. By making row operations, we 
can hope to simplify the shape of the system so that it is easier to find 
the solutions. 

Let us define two matrices to be row equivalent if one can be obtained 
from the other by a succession of elementary row operations. If A is the 
matrix of coefficients of a system of linear equations, and B the column 
vector as above, so that 


(A, B) 


is the augmented matrix, and if (A’, B") is row-equivalent to (A, B) then 
the solutions of the system 


AX — B 
are the same as the solutions of the system 
A'X = B'. 


To obtain an equivalent system (A’, B’) as simple as possible we use a 
method which we first illustrate in a concrete case. 


Example. Consider the augmented matrix in the above example. We 
have the following row equivalences: 


Be ae 1 2 | 
| t aek eb 22 
PEEL 3 0 4 


Subtract 3 times second row from first row 
0 —5 4 5 T 
| 1 —1 -1 -2 
2 —1 3 0 4 


Subtract 2 times second row from third row 
0 —5 4 5 7 
1 1 —1 -1 -2 
0 —3 3 2 8 


Interchange first and second row; multiply second row by — I. 
] 1 —1 -1 -2 
0 5 —4 —5 -7 
0 —3 5 2 8 
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Multiply second row by 3; multiply third row by 5. 
| 1 —1 -1 -2 
0 15 —12 —15 —21 
0 —15 25 10 40 
Add second row to third row. 

I 1 -1 -1 -2 

0 15 —12 —15 —21 

0 0 13  —5 19 


What we have achieved is to make each successive row start with a non- 
zero entry at least one step further than the preceding row. This makes 
it very simple to solve the equations. The new system whose augmented 
matrix is the matrix obtained last can be written in the form: 


x+y— z— w=-2, 
15y — 12z — 15w = —21, 
13z — 5w= 19. 


This is now in a form where we can solve by giving w an arbitrary value 
in the third equation, and solve for z from the third equation. Then we 
solve for y from the second, and x from the first. With the formulas, this 
gives: 


19 + Sw 
2 
13 
cba 122 Sw 
i 15 ? 
x=—-l—yt+Zz+w. 


We can give w any value to start with, and then determine values for 
x, y, z. Thus we see that the solutions depend on one free parameter. 
Later we shall express this property by saying that the set of solutions 
has dimension 1. 

For the moment, we give a general name to the above procedure. Let 
M be a matrix. We shall say that M is in row echelon form if it has the 
following property: 


Whenever two successive rows do not consist entirely of zeros, then the 
second row starts with a non-zero entry at least one step further to the 
right than the first row. All the rows consisting entirely of zeros are 
at the bottom of the matrix. 


In the previous example we transformed a matrix into another which 
Is in row echelon form. The non-zero coefficients occurring furthest to 
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the left in each row are called the leading coefficients. In the above 
example, the leading coefficients are 1, 15, 13. One may perform one 
more change by dividing each row by the leading coefficient. Then the 
above matrix is row equivalent to 


bo 4.4 wd c 
Q poo P 22). 
0 0 1-8 B 


In this last matrix, the leading coefficient of each row is equal to 1. One 
could make further row operations to insert further zeros, for instance 
subtract the second row from the first, and then subtract 2 times the 
third row from the second. This yields: 


WE 2h 26 —2 
ÜU X © Bee fe 
Ü- 0 1 x 19 


Unless the matrix is rigged so that the fractions do not look too hor- 
rible, it is usually a pain to do this further row equivalence by hand, but 
a machine would not care. 


Example. The following matrix is in row echelon form. 


0 2 —3 4 | 7 
0 0 0 3 2 —4 
0 0 0 0 —3 | 
0 0 0 0 0 0 


Suppose that this matrix is the augmented matrix of a system of linear 
equations, then we can solve the linear equations by giving some vari- 
ables an arbitrary value as we did. Indeed, the equations are: 


2y — 3z + 4w + t=7, 


5w + 2t = —4, 
—3t — 1. 
Then the solutions are 
t= —1/3, 
" —4 — 2t 
~; > 


z = any arbitrarily given value, 
eee aa 
ee 


x — any arbitrarily given value. 
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The method of changing a matrix by row equivalences to put it in row 
echelon form works in general. 


Theorem 4.1. Every matrix is row equivalent to a matrix in row echelon 
form. 


Proof. Select a non-zero entry furthest to the left in the matrix. If this 
entry is not in the first column, this means that the matrix consists 
entirely of zeros to the left of this entry, and we can forget about them. 
So suppose this non-zero entry is in the first column. After an inter- 
change of rows, we can find an equivalent matrix such that the upper 
left-hand corner is not 0. Say the matrix is 


ıı 042 Ain 
a21 422 Arn 
Ami Am2 ^77 Amn 


and a,, #0. We multiply the first row by a,,/a,, and subtract from the 
second row. Similarly, we multiply the first row by a;,/a,, and subtract 
it from the i-th row. Then we obtain a matrix which has zeros in the 
first column except for a,,. Thus the original matrix is row equivalent 
to a matrix of the form 


aii 012 Qin 
/ / 

0 a22 An 
/ / 

0 Am2 A mn 


We can continue until the matrix is in row echelon form (formally by 
induction). This concludes the proof. 


Observe that the proof is just another way of formulating the elimina- 
tion argument of §3. 
We give another proof of the fundamental theorem: 


Theorem 4.2. Let 


A,X; +--+ da,X, =Q, 


Ami eerie AmnXn = 0, 
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be a system of m homogeneous linear equations in n unknowns with 
nm. Then there exists a non-trivial solution. 


Proof. Let A = (aj) be the matrix of coefficients. Then A is equiva- 
lent to A' in row echelon form: 


Ay Xk, RE Sp (x) pee 0, 
y, Xk, + S(x) = 0, 


dy Xr, + S, (X) = 0, 


where a,, #0,...,a, #0 are the non-zero coefficients of the variables 
occurring furthest to the left in each successive row, and S,,(x),...,S, (x) 
indicate sums of variables with certain coefficients, but such that if a 
variable x; occurs in S,,(x), then j > k, and similarly for the other sums. 
If x; occurs in S, then j > k;. Since by assumption the total number of 
variables n is strictly greater than the number of equations, we must 
have r <n. Hence there are n —r variables other than x,,,...,x,, and 
n —r»0. We give these variables arbitrary values, which we can of 
course select not all equal to 0. Then we solve for the variables x,, 


Xy, X, Starting with the bottom equation and working back up, for 
instance 
XQ, = — S, (x)/as, 
Xy, = —S,, ((xXya,, ,, and so forth. 


This gives us the non-trivial solution, and proves the theorem. 


Observe that the pattern follows exactly that of the examples, but with 
a notation dealing with the general case. 


Exercises II, S4 
In each of the following cases find a row equivalent matrix in row echelon form. 


1. (a) 6 3 —4 (b) /1 0 2 


—4 1 —6 a 3 
1 2 —S 4 1 8 
Day =>. 3X. d (b /O 1 3 —2 
2 ub A 3 2 1 -4 3 
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Mast Xp 2 j (b) /1 2 
2- 4 fae 3 0 11 -5 3 

3 6 2-6 5 2 1 

4 1 1 5 


4. Write down the coefficient matrix of the linear equations of Exercise 5 in $3, 
and in each case give a row equivalent matrix in echelon form. Solve the 
linear equations in each case by this method. 


II, 85. Row Operations and Elementary Matrices 


Before reading this section, work out the numerical examples given in 
Exercises 33 through 37 of 82. 


The row operations which we used to solve linear equations can be 
represented by matrix operations. Let 1 Er m and 1 <s m. Let I,, 
be the square m x m matrix which has component 1 in the rs place, and 
0 elsewhere: 


Oeste eps 0 
Ls = Gisela ° 0 
a 1 


Let A = (aj) be any m x n matrix. What is the effect of multiplying 
I,,A? 


TTE OV fay, 0 -0 
r ; : e . r 
f " A S 
0 Ls -0 : = dası Asn 
. dsı i dsn . 
EE 0 0 --0 
Tem Amı Amn 


The definition of multiplication of matrices shows that I,,A is the matrix 
obtained by putting the s-th row of A in the r-th row, and zeros else- 
where. 

If r 2 s then I,, has a component 1 on the diagonal place, and 0 
elsewhere. Multiplication by I,, then leaves the r-th row fixed, and re- 
places all the other rows by zeros. 
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If r Æs let 
dad sue 


Then 
Js A =1,,A + I,,A. 


Then 1,,A puts the s-th row of A in the r-th place, and I„A puts the 
r-th row of A in the s-th place. All other rows are replaced by zero. 
Thus J,, interchanges the r-th row and the s-th row, and replaces all 
other rows by zero. 


Example. Let 


0 li 0 3 2 —1 
J= {1 0 0 and A= l 4 2 |}. 
0 0 0 —2 3 i 


If you perform the matrix multiplication, you will see directly that JA 
interchanges the first and second row of A, and replaces the third row 
by zero. 

On the other hand, let 


0 l 0 
E-|1 0 0 
0 0 1 


Then EA is the matrix obtained from A by interchanging the first and 
second row, and leaving the third row fixed. We can express E as a 
sum: 


E = l2 12, I3 


where I, is the matrix which has rs-component 1, and all other compon- 
ents 0 as before. Observe that E is obtained from the unit matrix by 
interchanging the first two rows, and leaving the third row unchanged. 
Thus the operation of interchanging the first two rows of A is carried 
out by multiplication with the matrix E obtained by doing this operation 
on the unit matrix. 

This is a special case of the following general fact. 


Theorem 5.1. Let E be the matrix obtained from the unit n x n matrix 
by interchanging two rows. Let A be an n x n matrix. Then EA is the 
matrix obtained from A by interchanging these two rows. 


[ IL, 85] ROW OPERATIONS AND ELEMENTARY MATRICES 79 


Proof. The proof is carried out according to the pattern of the exam- 
ple, it is only a question of which symbols are used. Suppose that we 
interchange the r-th and s-th row. Then we can write 


E = I4, + Iœ + sum of the matrices I,, with j Ær, j # s. 


Thus E differs from the unit matrix by interchanging the r-th and s-th 
rows. Then 


EA = I4A I, A + sum of the matrices I,,A, 


with j Zr, jz s. By the previous discussion, this is precisely the matrix 
obtained by interchanging the r-th and s-th rows of A, and leaving all 
the other rows unchanged. 


The same type of discussion also yields the next result. 


Theorem 5.2. Let E be the matrix obtained from the unit n x n matrix 
by multiplying the r-th row with a number c and adding it to the s-th 
row, r s. Let A be an n x n matrix. Then EA is obtained from A by 
multiplying the r-th row of A by c and adding it to the s-th row of A. 


Proof. We can write 
E =] + cly. 


Then EA = A + cl, 4. We know that I,,A puts the r-th row of A in the 
s-th place, and multiplication by c multiplies this row by c. All other 
rows besides the s-th row in cl,,A are equal to 0. Adding A + cl, A 
therefore has the effect of adding c times the r-th row of A to the s-th 
row of A, as was to be shown. 


Example. Let 


1 0 4 0 
pg .[9 ! 9 0 
0 0 1 0 
000 1 


Then E is obtained from the unit matrix by adding 4 times the third row 
to the first row. Take any 4 x n matrix A and compute EA. You will 
find that EA is obtained by multiplying the third row of A by 4 and 
adding it to the first row of A. 
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More generally, we can let E,,(c) for r z s be the elementary matrix. 


E, (c) = I + cl,,. 


— 
ac QUE ME 
"LIN. TQ 
PE 2E 
X ee 


It differs from the unit matrix by having rs-component equal to c. The 
effect of multiplication on the left by E,,(c) is to add c times the s-th row 
to the r-th row. 

By an elementary matrix, we shall mean any one of the following 
three types: 


(a) A matrix obtained from the unit matrix by multiplying the r-th 
diagonal component with a, number c z 0. 

(b) A matrix obtained from the unit matrix by interchanging two 
rows (say the r-th and s-th row, r z s). 

(c) A matrix E,(c) 2 I + cl, with r z s having rs-component c for 
r 7 s, and all other components 0 except the diagonal components 
which are equal to 1. 


These three types reflect the row operations discussed in the preceding 
section. 


Multiplication by a matrix of type (a) multiplies the r-th row by the 
number c. 

Multiplication by a matrix of type (b) interchanges the r-th and s-th 
row. 

Multiplication by a matrix of type (c) adds c times the s-th row to the 
r-th row. 


Proposition 5.3. An elementary matrix is invertible. 


Proof. For type (a), the inverse matrix has r-th diagonal component 
c^ !, because multiplying a row first by c and then by c^! leaves the row 
unchanged. 

For type (b), we note that by interchanging the r-th and s-th row 
twice we return to the same matrix we started with. 

For type (c), as in Theorem 5.2, let E be the matrix which adds c 
times the s-th row to the r-th row of the unit matrix. Let D be the 
matrix which adds —c times the s-th row to the r-th row of the unit 
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matrix (for r # s) Then DE is the unit matrix, and so is ED, so E is 
invertible. 


Example. The following elementary matrices are inverse to each 


other: 
1 0 4 0 | 0 —4 0 
0 l 0 0 0 l 0 0 
E = "l2 
0 0 l 0 P 0 0 l 0 
0 0 0 1 0 0 0 1 


We shall find an effective way of finding the inverse of a square ma- 
trix if it has one. This is based on the following properties. 


If A, B are square matrices of the same size and have inverses, then so 
does the product AB, and 


(AB)! = B'A. 
This is immediate, because 
ABB !A !  AlA ! = AA ! =I. 
Similarly, for any number of factors: 


Proposition 5.4. If A,,...,A, are invertible matrices of the same size, 
then their product has an inverse, and 


(A, A) F9 A, 1 ALL. 


Note that in the right-hand side, we take the product of the inverses in 
reverse order. Then 


A,"CAQALISÀA40 —I 


because we can collapse A,A; ! to I, then A4, ,A4,.!, to I and so forth. 
Since an elementary matrix has an inverse, we conclude that any pro- 
duct of elementary matrices has an inverse. 


Proposition 5.5. Let A be a square matrix, and let A' be row equivalent 
to A. Then A has an inverse if and only if A' has an inverse. 


Proof. There exist elementary matrices E,,...,E, such that 
A' =E, e EA. 


Suppose that A has an inverse. Then the right-hand side has an inverse 
by Proposition 5.4 since the right-hand side is a product of invertible 
matrices. Hence A' has an inverse. This proves the proposition. 
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We are now in a position to find an inverse for a square matrix A 
if it has one. By Theorem 4.1 we know that A is row equivalent to a 
matrix A’ in echelon form. If one row of A’ is zero, then by the defini- 
tion of echelon form, the last row must be zero, and A’ is not invertible, 
hence A is not invertible. If all the rows of A' are non-zero, then A' is a 
triangular matrix with non-zero diagonal components. It now suffices to 
find an inverse for such a matrix. In fact, we prove: 


Theorem 5.6. A square matrix A is invertible if and only if A is row 
equivalent to the unit matrix. Any upper triangular matrix with non- 
zero diagonal elements is invertible. 


Proof. Suppose that A is row equivalent to the unit matrix. Then A is 
invertible by Proposition 5.5. Suppose that A is invertible. We have just 
seen that A is row equivalent to an upper triangular matrix with non- 
zero elements on the diagonal. Suppose A is such a matrix: 


Qi, 04,5 c^ Ay 
0 ay, Arn 
0 O - a, 


By assumption we have a,,:--d,, 7 0. We multiply the i-th row with 
a;,'. We obtain a triangular matrix such that all the diagonal compon- 
ents are equal to 1. Thus to prove the theorem, it suffices to do it in 


this case, and we may assume that A has the form 


l dij >t Qin 
0 1 bates An 


We multiply the last row by aj, and subtract it from the i-th row for 
i = ]1,...,n — 1. This makes all the elements of the last column equal to 
0 except for the lower right-hand corner, which is 1. We repeat this 
procedure with the next to the last row, and continue upward. This 
means that by row equivalences, we can replace all the components 
which lie strictly above the diagonal by 0. We then terminate with the 
unit matrix, which is therefore row equivalent with the original matrix. 
This proves the theorem. 


Corollary 5.7. Let A be an invertible matrix. Then A can be expressed 
as a product of elementary matrices. 
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Proof. This is because A is row equivalent to the unit matrix, and 
row operations are represented by multiplication with elementary ma- 
trices, so there exist E,,...,E, such that 


Esek ASI 


Then A = E, ! --- E, !, thus proving the corollary. 


When A is so expressed, we also get an expression for the inverse of 
A, namely 


A = E,--- E. 


The elementary matrices E,,...,E, are those which are used to change A 
to the unit matrix. 


Example. Let 


2 =3 l 1 0 0 
Z 0 I 0 0 1 


We want to find an inverse for A. We perform the following row opera- 
tions, corresponding to the multiplication by elementary matrices as 
shown. 


Interchange first two rows. 


l 1 —1 0 1 0 
2 —3 1}, 1 0 0}. 
2 0 1 0 0 1 


Subtract 2 times first row from second row. 
Subtract 2 times first row from third row. 


l 1 —1 0 1 0 
0 —5 3 |, 1 —2 0 |. 
0 —2 3 0 —2 1 


Subtract 2/5 times second row from third row. 


mE S 0 1 0 
0 —5 3] i = 0 
0 0 9/5 2/5 —6/5 1 
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Subtract 5/3 of third row from second row. 
Add 5/9 of third row to first row. 


1 1 0 —2/9 13 5/9 
0-5 ol 53 0  —Sspm]. 
0 0 9/5 —2/5 —6/5 1 


Add 1/5 of second row to first row. 


1 0 0 9 13 2/9 
0 —5 0 |, 53 0 —S/3). 
0 0 9/5 O15: 95. l 


Multiply second row by — 1/5. 
Multiply third row by 5/9. 


1 0 0 19 1/3 2/9 
0 1 90] —-13 0 1/3 |. 
0 0 1 -2/) —2/3  Sp9 


Then A^! is the matrix on the right, that is 


9 13 2/9 
A -[-13 0 1/3 |. 
—2/9 —2/3 5/9 


You can check this by direct multiplication with A to find the unit 
matrix. 


If A is a square matrix and we consider an inhomogeneous system of 
linear equations 


AX = B, 


then we can use the inverse to solve the system, if A is invertible. In- 
deed, in this case, we multiply both sides on the left by A^! and we find 


XA B. 


This also proves: 


Proposition 5.8. Let AX — B be a system of n linear equations in n 
unknowns. Assume that the matrix of coefficients A is invertible. Then 
there is a unique solution X to the system, and 


X—4A !B. 
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Exercises II, §5 


1. Using elementary row operations, find inverses for the following matrices. 


(a) /2 1 2 (pz 3 -1 5 

0 3 -1 "2E NE 

4 1 1 —2 4 3 
(0/2 4 3 (d /1 2 -1 
-1 3 0 0 1 1 
0 2 3 0 2 7 

(e) /-1 5 3 t /3 1 2 

4 0 0 4 5 1| 

2" x» 8 "E 


Note: For another way of finding inverses, see the chapter on determinants. 


2. Let r # s. Show that 12, = O. 
3. Let r zs. Let E(c) 2 I + cl,,. Show that 


E,,(c)E,,(c’) EE E,(c F c’). 


II, §6. Linear Combinations 


Let Al,...,4" be m-tuples in R”. Let x,,...,x, be numbers. Then we call 
XA! t 0 xA" 


a linear combination of 4!,...,4"; and we call x,,...,x, the coefficients of 


the linear combination. A similar definition applies to a linear combina- 
tion of row vectors. 


The linear combination is called non-trivial if not all the coefficients 
X,,...,X, are equal to 0. 


Consider once more a system of linear homogeneous equations 


Q4 1X1 + ot + AinXn = 0 
(xx) . : 


G4 xy bed X =; 


mn*"n 


Our system of homogeneous equations can also be written in the form 
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or more concisely: 


x14! +--+ x,A" =O, 


where A4!l,...,4" are the column vectors of the matrix of coefficients, 
which is A = (aj). Thus the problem of finding a non-trivial solution 
for the system of homogeneous linear equations is equivalent to finding a 
non-trivial linear combination of A!,...,4" which is equal to O. 


Vectors A4!,...,4" are called linearly dependent if there exist numbers 
X,,...,X, not all equal to O such that 


x14! x, A" =O. 
Thus a non-trivial solution (x,,...,x,) is an n-tuple which gives a linear 
combination of A4!,...,4" equal to O, i.e. a relation of linear dependence 


between the columns of A. We may thus summarize the description of the 
set of solutions of the system of homogeneous linear equations in a table. 


(a) It consists of those vectors X giving linear relations 


xA! +++» +x,A"=O 


between the columns of A. 


(b) It consists of those vectors X perpendicular to the rows of A, 
that is X-A; — O for all i. 


(c) It consists of those vectors X such that AX = O. 


Vectors A4!,...,4" are called linearly independent if, given any linear 
combination of them which is equal to O, Le. 


xA! +---+x,A" =O, 
then we must necessarily have x; = 0 for all j = 1,...,n. This means that 
there is no non-trivial relation of linear dependence among the vectors 
A!, ..., A", 
Example. The standard unit vectors 


E, =(1,0,...,0),...,E, = (0,...,0, 1) 


of R" are linearly independent. Indeed, let x,,...,x, be numbers such 
that 


Xib; + SS + x,E, = O. 
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The left-hand side is just the n-tuple (x,,...,x,). If this n-tuple is O, then 
all components are 0, so x; — 0 for all i. This proves that E,,...,E, are 
linearly independent. 


We shall study the notions of linear dependence and independence 
more systematically in the next chapter. They were mentioned here just 
to have a complete table for the three basic interpretations of a system 
of linear equations, and to introduce the notion in a concrete special 
case before giving the general definitions in vector spaces. 


Exercise II, S6 


1. (a) Let A = (aj), B = (bi) and let AB = C with C = (cą). Let C* be the k-th 
column of C. Express C* as a linear combination of the columns of A. 
Describe precisely which are the coefficients, coming from the matrix B. 

(b) Let AX — C* where X is some column of B. Which column is it? 


CHAPTER Ill 


Vector Spaces 


As usual, a collection of objects will be called a set. A member of the 
collection is also called an element of the set. It is useful in practice to 
use short symbols to denote certain sets. For instance we denote by R 
the set of all numbers. To say that “x is a number" or that “x is an 
element of R" amounts to the same thing. The set of n-tuples of 
numbers will be denoted by R”. Thus “X is an element of R"" and “X 
is an n-tuple" mean the same thing. Instead of saying that u is an 
element of a set S, we shall also frequently say that u lies in S and we 
write ueS. If S and S' are two sets, and if every element of S’ is an 
element of S, then we say that S' is a subset of S. Thus the set of 
rational numbers is a subset of the set of (real) numbers. To say that S 
is a subset of S’ is to say that S is part of S. To denote the fact that S 
is a subset of S’, we write S c S. 

If S,, S, are sets, then the intersection of S, and S,, denoted by 
S, S,, is the set of elements which lie in both S, and S,. The union of 
S, and S,, denoted by S, US,, is the set of elements which lie in S, or 
os 


III, S1. Definitions 


In mathematics, we meet several types of objects which can be added 
and multiplied by numbers. Among these are vectors (of the same 
dimension) and functions. It is now convenient to define in general a 
notion which includes these as a special case. 

A vector space V is a set of objects which can be added and multi- 
plied by numbers, in such a way that the sum of two elements of V is 
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again an element of V, the product of an element of V by a number is an 
element of V, and the following properties are satisfied: 


VS 1. Given the elements u, v, w of V, we have 
(u+v)+w=ut+(v+w). 
VS 2. There is an element of V, denoted by O, such that 
O+u=u+O=u 
for all elements u of V. 
VS 3. Given an element u of V, the element (—1)u is such that 
u -(—1)u = O. 
VS 4. For all elements u, v of V, we have 
u+tv=v+uU. 
VS 5. If c is a number, then c(u + v) = cu + cv. 
VS 6. If a, b are two numbers, then (a + b)v = av + bv. 
VS 7. If a, b are two numbers, then (ab)v = a(bv). 


VS 8. For all elements u of V, we have 1-u=u (1 here is the number 
one). 


We have used all these rules when dealing with vectors, or with func- 
tions but we wish to be more systematic from now on, and hence have 
made a list of them. Further properties which can be easily deduced 
from these are given in the exercises and will be assumed from now on. 

The algebraic properties of elements of an arbitrary vector space are 
very similar to those of elements of R?, R°, or R”. Consequently it is 
customary to call elements of an arbitrary vector space also vectors. 

If u, v are vectors (i.e. elements of the arbitrary vector space V), then 
the sum 

u+(—1)v 


is usually written u — v. We also write —v instead of (— 1)v. 


Example 1. Fix two positive integers m, n. Let V be the set of all 
m x n matrices. We also denote V by Mat(m x n). Then V is a vector 
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space. It is easy to verify that all properties VS 1 through VS8 are 
satisfied by our rules for addition of matrices and multiplication of 
matrices by numbers. The main thing to observe here is that addition of 
matrices is defined in terms of the components, and for the addition 
of components, the conditions analogous to VS1 through VS4 are 
satisfied. They are standard properties of numbers. Similarly, VS 5 
through VS 8 are true for multiplication of matrices by numbers, because 
the corresponding properties for the multiplication of numbers are true. 


Example 2. Let V be the set of all functions defined for all numbers. 
If f, g are two functions, then we know how to form their sum f +g. It 
is the function whose value at a number t is f(t) + g(t). We also know 
how to multiply f by a number c. It is the function cf whose values at a 
number t is cf(t). In dealing with functions, we have used properties 
VS 1 through VS 8 many times. We now realize that the set of functions 
is a vector space. 

The function f such that f(t)= O for all t is the zero function. We 
emphasize the condition for all t. If a function has some of its values 
equal to zero, but other values not equal to 0, then it is not the zero 
function. 

In practice, a number of elementary properties concerning addition of 
elements in a vector space are obvious because of the concrete way the 
vector space is given in terms of numbers, for instance as in the previous 
two examples. We shall now see briefly how to prove such properties 
just from the axioms. 

It is possible to add several elements of a vector space. Suppose we 
wish to add four elements, say u, v, w, z. We first add any two of them, 
then a third, and finally a fourth. Using the rules VS 1 and VS 4, we see 
that it does not matter in which order we perform the additions. This is 
exactly the same situation as we had with vectors. For example, we have 


(utv)+w)t+z=(u+(v4+w))4+z 
=((Vv+w)+u)+z 
=(v+w)+(u+z), ete. 


Thus it is customary to leave out the parentheses, and write simply 
u+v+w 4 Z. 


The same remark applies to the sum of any number n of elements of V. 
We shall use 0 to denote the number zero, and O to denote the 
element of any vector space V satisfying property VS 2. We also call it 
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zero, but there is never any possibility of confusion. We observe that 
this zero element O is uniquely determined by condition VS 2. Indeed, if 


v+w=vD 
then adding —v to both sides yields 
—v+v+w=—v+v0=0, 


and the left-hand side is just O + w = w, so w= O. 
Observe that for any element v in V we have 


Ov = O. 


Proof. 
O=v+(—l1)v=(1 — 1)v = Ov. 


Similarly, if c is a number, then 
cO = O0. 


Proof. We have cO = c(O + O)= cO + cO. Add —cO to both sides 
to get cO — O. 


Subspaces 


Let V be a vector space, and let W be a subset of V. Assume that W 
satisfies the following conditions. 


(i) If v, w are elements of W, their sum v + w is also an element of 
W. 
(ii) If v is an element of W and c a number, then cv is an element of 
W. 
(ui) The element O of V is also an element of W. 


Then W itself is a vector space. Indeed, properties VS 1 through VS 8, 
being satisfied for all elements of V, are satisfied also for the elements of 
W. We shall call W a subspace of V. 


Example3. Let V — R" and let W be the set of vectors in V whose 
last coordinate is equal to 0. Then W is a subspace of V, which we 
could identify with R"'!. 


Example 4. Let A be a vector in R?. Let W be the set of all elements 
B in R? such that B-A = Q, i.e. such that B is perpendicular to A. Then 
W is a subspace of R?. To see this, note that O-A = 0, so that O is in 
W. Next, suppose that B, C are perpendicular to A. Then 


(B+C)-A=B-A+C-A=0, 
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so that B 4- C is also perpendicular to A. Finally, if x is a number, then 
(xB)- A = x(B- A) 2 0, 


so that xB is perpendicular to A. This proves that W is a subspace of 
R°. 


More generally, if A is a vector in R”, then the set of all elements B in 
R” such that B-A — 0 is a subspace of R”. The proof is the same as 
when n = 3. 


Example5. Let Sym(n x n) be the set of all symmetric nxn 
matrices. Then Sym(n x n) is a subspace of the space of all n xn 
matrices. Indeed, if A, B are symmetric and c is a number, then A 4- B 
and cA are symmetric. Also the zero matrix is symmetric. 


Example 6. If f, g are two continuous functions, then f + g is con- 
tinuous. If c is a number, then cf is continuous. The zero function 
is continuous. Hence the continuous functions form a subspace of the 
vector space of all functions. 

If f, g are two differentiable functions, then their sum f + g is differen- 
tiable. If c is a number, then cf is differentiable. The zero function is 
differentiable. Hence the differentiable functions form a subspace of the 
vector space of all functions. Furthermore, every differentiable function 1s 
continuous. Hence the differentiable functions form a subspace of the 
vector space of continuous functions. 


Example 7. Let V be a vector space and let U, W be subspaces. We 
denote by U ^ W the intersection of U and W, ie. the set of elements 
which lie both in U and W. Then U ^ W is a subspace. For instance, if 
U, W are two planes in 3-space passing through the origin, then in 
general, their intersection will be a straight line passing through the ori- 
gin, as shown in Fig. 1. 


Figure 1 
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Example 8. Let U, W be subspaces of a vector space V. By 
U -W 


we denote the set of all elements u + w with ue U and we W. Then we 
leave it to the reader to verify that U + W is a subspace of V, said to be 
generated by U and W, and called the sum of U and W. 


Exercises III, $1 


1. Let A,,...,A, be vectors in R". Let W be the set of vectors B in R" such that 
B. A; = 0 for every i = 1,...,r. Show that W is a subspace of R”. 


2. Show that the following sets of elements in R? form subspaces. 
(a) The set of all (x, y) such that x = y. 
(b) The set of all (x, y) such that x — y=0. 
(c) The set of all (x, y) such that x + 4y = O. 


3. Show that the following sets of elements in R? form subspaces. 
(a) The set of all (x, y, z) such that x + y -z — 0. 
(b) The set of all (x, y, z) such that x = y and 2y =z. 
(c) The set of all (x, y, z) such that x + y = 3z. 


4. If U, W are subspaces of a vector space V, show that Um W and U + W are 
subspaces. 


5. Let V be a subspace of R”. Let W be the set of elements of R” which 
are perpendicular to every element of V. Show that W is a subspace of R". 
This subspace W is often denoted by V+, and is called V perp, or also the 
orthogonal complement of V. 


III, $2. Linear Combinations 


Let V be a vector space, and let v,,...,v, be elements of V. We shall say 
that v,,...,v, generate V if given an element veV there exist numbers 
X,,...,X, Such that 


v= XU, dee qd XU 


Example 1. Let E,,...,E, be the standard unit vectors in R", so E; 
has component 1 in the i-th place, and component 0 in all other places. 
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Then E,,...,E, generate R”. Proof: given X = (x,,...,x,)e R". Then 


Xue XE. 


t=1 
so there exist numbers satisfying the condition of the definition. 


Let V be an arbitrary vector space, and let v,,...,v, be elements of V. 
Let x,,...,x, be numbers. An expression of type 


X404 7r 5: XD, 


is called a linear combination of v,,...,v,. The numbers x,,...,x, are 
then called the coefficients of the linear combination. 


The set of all linear combinations of v,,...,v, is a subspace of V. 


Proof. Let W be the set of all such linear combinations. Let y,,...,y, 
be numbers. Then 


(X40, V: + XQU) + (qti o + nM a) 
= (X, yQU, t c OS + y9)u. 


Thus the sum of two elements of W is again an element of W, 1e. a 
linear combination of v,,...,v,. Furthermore, if c is a number, then 


C(X4U, +++ + XU) = CXQU, d + CXQU, 


is a linear combination of v,,...,v,, and hence is an element of W. 
Finally, 


O — 0v, +- + Ov 


is an element of W. This proves that W is a subspace of V. 


The subspace W consisting of all linear combinations of v,,...,v, 1s 


called the subspace generated by 1,,...,v 


n 


n° 


Example 2. Let v, be a non-zero element of a vector space V, and let 
w be any element of V. The set of elements 


w+ tv, with teR 
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is called the line passing through w in the direction of v,. We have al- 
ready met such lines in Chapter I, $5. If w — O, then the line consisting 
of all scalar multiples tv, with te R is a subspace, generated by v,. 

Let v,, v, be elements of a vector space V, and assume that neither is 
a scalar multiple of the other. The subspace generated by v,, v, is called 
the plane generated by v,, v,. It consists of all linear combinations 


tivi + t5U; with t,, t, arbitrary numbers. 


This plane passes through the origin, as one sees by putting t, = t, = 0. 


Plane passing 
through the origin 


Figure 2 


We obtain the most general notion of a plane by the following opera- 
tion. Let S be an arbitrary subset of V. Let P be an element of V. If we 
add P to all elements of S, then we obtain what is called the translation 
of S by P. It consists of all elements P + v with v in S. 


Example 3. Let v,, v; be elements of a vector space V such that 
neither is a scalar multiple of the other. Let P be an element of V. We 
define the plane passing through P, parallel to v,, v, to be the set of all 
elements 

P + tivi + tav 


where t,, t, are arbitrary numbers. This notion of plane is the analogue, 
with two elements v,, v;, of the notion of parametrized line considered in 
Chapter I. 


Warning. Usually such a plane does not pass through the origin, as 
shown on Fig. 3. Thus such a plane is not a subspace of V. If we take 
P — O, however, then the plane is a subspace. 
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Plane not passing 
through the origin 


Figure 3 


Sometimes it is interesting to restrict the coefficients of a linear com- 
bination. We give a number of examples below. 


Example 4. Let V be a vector space and let v, u be elements of V. We 
define the line segment between v and v + u to be the set of all points 


v + tu, O<t<l. 
This line segment is illustrated in the following picture. 


v+u 


v-+tu 


v Figure 4 


For instance, if t = 1, then v + iu is the point midway between v and 
v+u. Similarly, if t = 4, then v + $u is the point one third of the way 
between v and v + u (Fig. 5). 


v+u v+u 


(a) (b) 
Figure 5 
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If v, w are elements of V, let u-w-— v. Then the line segment 
between v and w is the set of all points v + tu, or 


v + t(w — v), O<t<l. 


v+t(w—v) 


v 


Figure 6 


Observe that we can rewrite the expression for these points in the form 
(1) (1 — tw + tw, O<t<l, 
and letting s = 1 — t, t = 1 — s, we can also write it as 
sv + (1 — s)w, O<s<l. 
Finally, we can write the points of our line segment in the form 
(2) tv + tiw with t, t20 and ¢t,+t,=1. 


Indeed, letting t = t,, we see that every point which can be written in 
the form (2) satisfies (1). Conversely, we let t, = 1 — t and t, = t and see 
that every point of the form (1) can be written in the form (2). 


Example 5. Let v, w be elements of a vector space V. Assume that 
neither is a scalar multiple of the other. We define the parallelogram 
spanned by v, w to be the set of all points 


tiv + taw, O<st;<1 for i=1, 2. 


This definition is clearly justified since t,v is a point of the segment 
between O and v (Fig. 7), and t,w is a point of the segment between O 
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and w. For all values of t,, t, ranging independently between O and 1, 
we see geometrically that t,v + tw describes all points of the parallelo- 
gram. 


vdw 


tyu+tow 


tw 


Figure 7 


We obtain the most general parallelogram (Fig. 8) by taking the 
translation of the parallelogram just described. Thus if u is an element of 
V, the translation by u of the parallelogram spanned by v and w consists 
of all points 


u + tv + tw, Ostal for i=1, 2. 


u+v 
u+ w 


p udi i 
O 
Figure 8 


Similarly, in higher dimensions, let v,, v,, v4 be elements of a vector 
space V. We define the box spanned by these elements to be the set of 
linear combinations 


tv, + £505 + t303 with O<t, <1. 


We draw the picture when v,, v2, v4 are in general position: 
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Figure 9 


There may be degenerate cases, which will lead us into the notion of 
linear dependence a little later. 


Exercises III, §2 


1. Let A,,...,A, be generators of a subspace V of R”. Let W be the set of all 
elements of R” which are perpendicular to A4,,...,4,. Show that the vectors of 
W are perpendicular to every element of V. 


r° 


2. Draw the parallelogram spanned by the vectors (1,2) and (— 1, 1) in R?. 
3. Draw the parallelogram spanned by the vectors (2, — 1) and (1,3) in R?. 


III, 83. Convex Sets 


Let S be a subset of a vector space V. We shall say that S is convex 
if given points P, Q in S then the line segment between P and Q is 
contained in S. In Fig. 10, the set on the left is convex. The set on the 
right is not convex since the line segment between P and Q is not entir- 
ely contained in $. 


{> 


Convex set Not convex 


Figure 10 
We recall that the line segment between P and Q consists of all points 
(1 — t)P + tQ with O<t<l. 


This gives us a simple test to determine whether a set is convex or not. 
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Example 1. Let S be the parallelogram spanned by two vectors v,, v2, 
so S is the set of linear combinations 


tivi + t,v, with O<t, <1. 
We wish to prove that S is convex. Let 
pfo jd 1505 and Q = su, + $50; 
be points in S. Then 
(1 — HP + tQ = (1 — ttv, + t,v2) + t(sv, + $202) 
= (1 — ttv, + (1 — t)tv, + ts,v, + ts,v, 
= fiV; + F202, 


where 
rı, = (1 — t)t, + ts, and r, = (1 — t)t, + ts,. 


But we have 
0x(1—0t,-ts, (1—t) -t-1 


and 
0x(1—0t,-4 ts, E(1—t)-t- 1. 


Hence 
(1 — t)P + tQ = ruv rv, with O<r, <l. 


This proves that (1 — t)P + tQ is in the parallelogram, which is therefore 
convex. 


Example 2. Half planes. Consider a linear equation like 
2x — 3y= 6. 


This is the equation of a line as shown on Fig. 11. 


Figure 11 
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The inequalities 


2x —3y € 6 and 2x —3y 26 


determine two half planes; one of them lies below the line and the other 
lies above the line, as shown on Fig. 12. 


Figure 12 


Let A — (2, —3). We can, and should write the linear inequalities in 
the form 


A-X 26 and A-X € 6, 
where X — (x, y. Prove as Exercise 2 that each half plane is convex. 
This is clear intuitively from the picture, at least in R?, but your proof 


should be valid for the analogous situation in R". 


Theorem 3.1. Let P,,...,P, be points of a vector space V. Let S be the 
set of all linear combinations 


LIu teet P 


with 0 € t; and t, ----- +t, — 1. Then S is convex. 


Proof. Let 
PSP a ee er, 
and 
QE SUP. esr, 
with 0 € t;, O € s;, and 
t+- +t, = 1, 
Site +s = 1. 
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Let O0O<t <1. Then: 


(1 — ÐP + tO = (1 — DGP, +-+ (1 —ttP, 
Fisi. r P, 
= [(1-— £t, + ts,]P, +- [ — ot, + ts,]P,. 


We have 0 € (1 — t)t; + ts; for all i, and 


(1 — Dt, + ts, +- +0 — Dt, + ts, 
= (1 — tt, +- + ta) + ts, +: Sa) 
=(1—t)+t 
=l 


This proves our theorem. 


In the next theorem, we shall prove that the set of all linear combina- 
tions 


(P, 4M o +t, Pan with 0 € t; and tite +t, =l 


is the smallest convex set containing P,,...,P,. For example, suppose 
that P,, P,, P} are three points in the plane not on a line. Then it is 
geometrically clear that the smallest convex set containing these three 
points is the triangle having these points as vertices. 


P, 


P; 


P; 
Figure 13 


Thus it 1s natural to take as definition of a triangle the following pro- 
perty, valid in any vector space. 

Let P,, Pa, P} be three points in a vector space V, not lying on a 
line. Then the triangle spanned by these points is the set of all combina- 
tions 


t,.P,+t,P,+t,P, with O<t; and ¢t,+t,+t,=1. 


When we deal with more than three points, then the set of linear 
combinations as in Theorem 3.1 looks as in the following figure. 
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P; 


P, Fs 
Figure 14 


We shall call the convex set of Theorem 3.1 the convex set spanned by 
P,,...,P,. Although we shall not need the next result, it shows that this 
convex set is the smallest convex set containing all the points P,,...,P,. 
Omit the proof if you can't handle the argument by induction. 


Theorem 3.2. Let P,,...,P, be points of a vector space V. Any convex 
set which contains P,,...,P, also contains all linear combinations 


LUPA +- +t, Pn 
with 0 € t; for all i and t, +---+t, — 1. 


Proof. We prove this by induction. If n= 1, then t; = 1, and our 
assertion is obvious. Assume the theorem proved for some integer 
n—1z1. We shall prove it for n. Let t,,...,t, be numbers satisfying 
the conditions of the theorem. Let S’ be a convex set containing 
P,,...,P,. We must show that S’ contains all linear combinations 


t, P, gruss + t,P,. 


If t, — 1, then our assertion is trivial because f, =---=t,-, =0. Sup- 
pose that t, #1. Then the linear combination t,P, +-+- + t ,P„ 1s equal 
to 


t 
(1 — t) AIL UE: 


n n 


Let 


Then s; Z 0 and s, +- +S,- =1 so that by induction, we conclude 
that the point 


Q-—s,P, qassepaes naf sa 
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lies in S’. But then 
(1 —t,)0 + t,P, = (P, P, 


lies in S' by definition of a convex set, as was to be shown. 


Exercises IIT, 83 
1. Let S be the parallelogram consisting of all linear combinations t,v, + t,v, 
with 0 € t, <1 andO € t, € 1. Prove that S is convex. 


2. Let A be a non-zero vector in R" and let c be a fixed number. Show that the 
set of all elements X in R” such that A- X 2 c is convex. 


3. Let S be a convex set in a vector space. If c is a number, denote by cS the 
set of all elements cv with v in S. Show that cS is convex. 


4. Let S, and S, be convex sets. Show that the intersection S, ^ S, is convex. 


5. Let S be a convex set in a vector space V. Let w be an arbitrary element of 
V. Let w + S be the set of all elements w + v with v in S. Show that w + S is 
convex. 


III, $4. Linear Independence 
Let V be a vector space, and let v,,...,v, be elements of V. We shall say 
that v,,...,v, are linearly dependent if there exist numbers a,,...,a, not 
all equal to O such that 
Q4, + pre + a, U, = O. 

If there do not exist such numbers, then we say that v,,...,v, are linearly 
independent. In other words, vectors v,,...,v, are linearly independent if 
and only if the following condition is satisfied: 

Let a,,...,a, be numbers such that 

QU, +++: +a,0, = O; 
then a; = 0 for all i=1,...,n. 
Example 1. Let V = R" and consider the vectors 


E, =(1,0,...,0) 


E, = (0,0,...,1). 
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Then E,,...,E, are linearly independent. Indeed, let a,,...,a, be numbers 
such that aE, +- + a E, =O. Since 


a,E, t +a, E, = (a,,...,0,), 
it follows that all a; = O. 
Example 2. Show that the vectors (1, 1) and (—3, 2) are linearly inde- 


pendent. 
Let a, b be two numbers such that 


a(1, 1) + b( —3, 2) = O. 
Writing this equation in terms of components, we find 
a— 3b=0, a+2b=0. 


This is a system of two equations which we solve for a and b. Sub- 
tracting the second from the first, we get —5b=0, whence b= 0. 
Substituting in either equation, we find a — 0. Hence, a, b are both 0, 
and our vectors are linearly independent. 


If elements v,,...,v, of V generate V and in addition are linearly inde- 
pendent, then {v,,...,v,} is called a basis of V. We shall also say that 
the elements v,,...,v, constitute or form a basis of V. 


Example 3. The vectors E,,...,E, of Example 1 form a basis of R”. 
To prove this we have to prove that they are linearly independent, which 
was already done in Example 1; and that they generate R". Given an 
element A = (a,,...,a,) of R” we can write A as a linear combination 


A-—a,E. +e +a, En, 
so by definition, E,,...,E, generate R”. Hence they form a basis. 


However, there are many other bases. Let us look at n = 2. We shall 
find out that any two vectors which are not parallel form a basis of R?. 
Let us first consider an example. 


U2 


If v,, v; are as drawn, they 
form a basis of R°. 


Figure 15 
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Example 4. Show that the vectors (1, 1) and (— 1, 2) form a basis of 
R?. 

We have to show that they are linearly independent and that they 
generate R?. To prove linear independence, suppose that a, b are 
numbers such that 


a(1, 1) + b(—1, 2) = (0, 0) 


Then 
a—b=0, a+ 2b — 0. 


Subtracting the first equation from the second yields 3b = 0, so that 
b —0. But then from the first equation, a = 0, thus proving that our 
vectors are linearly independent. 

Next, we must show that (1, 1) and (— 1,2) generate R?. Let (s, t) be 
an arbitrary element of R?. We have to show that there exist numbers x, 
y such that 

x(1, 1) + y(—1, 2) = (s, t). 


In other words, we must solve the system of equations 


X— ys, 


x -2y-t. 


Again subtract the first equation from the second. We find 


3y =t — S, 
whence 
_t—s 
y= 3 , 
and finally 
ts 
Xx=yrts= 3 +S 


This proves that (1,1) and (—1, 2) generate R*, and concludes the proof 
that they form a basis of R?. 


The general story for R? is expressed in the following theorem. 


Theorem 4.1. Let (a, b) and (c, d) be two vectors in R?. 


(i) They are linearly dependent if and only if ad — bc = 0. 
(ii) If they are linearly independent, then they form a basis of R?. 
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Proof. First work it out as an exercise (see Exercise 4). If you can't 
do it, you will find the proof in the answer section. It parallels closely 
the procedure of Example 4. 


Let V be a vector space, and let {v,,...,v,) be a basis of V. The 
elements of V can be represented by n-tuples relative to this basis, as 
follows. If an element v of V is written as a linear combination 


v= X10, + form + XnUn 
of the basis elements, then we call (x,,...,x,) the coordinates of v with 
respect to our basis, and we call x; the i-th coordinate. The coordinates 
with respect to the usual basis E,,...,E, of R" are simply the coordinates 


as defined in Chapter I, $1. 


The following theorem shows that there can only be one set of co- 
ordinates for a given vector. 


Theorem 4.2. Let V be a vector space. Let v,,...,v, be linearly inde- 
pendent elements of V. Let x,,...,x, and y,,...,y, be numbers such that 


X410, o ROXQD, m VOU HEP, 
Then we must have x; = y; for all i = 1,...,n. 
Proof. Subtract the right-hand side from the left-hand side. We get 
XQ4U, — YV t cc + XU, — yíU, = O. 
We can write this relation also in the form 
(x, — yii t c + (Xn — y), = O. 
By definition, we must have x; — y; — 0 for all i = 1,...,n, thereby prov- 
ing our assertion. 
The theorem expresses the fact that when an element is written as a 


linear combination of v,,...,v,, then its coefficients x,,...,x, are uniquely 
determined. This is true only when v,,...,v, are linearly independent. 


Example 5. Find the coordinates of (1,0) with respect to the two 
vectors (1, 1) and (— 1, 2). 
We must find numbers a, b such that 


a(1, 1) + b(—1, 2) = (1, 0). 
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Writing this equation in terms of coordinates, we find 
a—b=1, a+2b=0. 


Solving for a and b in the usual manner yields b= — i and a=. 
Hence the coordinates of (1,0) with respect to (1, 1) and (— 1,2) are 


(5, TY 3). 


Example 6. The two functions e', e^ are linearly independent. To 


prove this, suppose that there are numbers a, b such that 
ae' + be? «0 

(for all values of t). Differentiate this relation. We obtain 
ae' + 2be?' = 0. 


Subtract the first from the second relation. We obtain be” = 0, and hence 
b — 0. From the first relation, it follows that ae’ = 0, and hence a= 0. 
Hence e’, e” are linearly independent. 


Example 7. Let V be the vector space of all functions of a variable t. 
Let fi,...,f, be n functions. To say that they are linearly dependent is to 
say that there exist n numbers a,,...,a, not all equal to O such that 


a, f,(t) +: + af, (t) = 9 


for all values of t. 


Warning. We emphasize that linear dependence for functions means 
that the above relation holds for all values of t. For instance, consider 
the relation 


a sin t +b cos t = 0, 
where a, b are two fixed numbers not both zero. There may be some 
values of t for which the above equation is satisfied. For instance, if 


a x 0 we then can solve 


sint b 


cost a 


or in other words, tan t — b/a to get at least one solution. However, the 
above relation cannot hold for all values of t, and consequently sin t, 
cos t are linearly independent, as functions. 
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Example 8. Let V be the vector space of functions generated by the 
two functions e, e^?. Then the coordinates of the function 


3e! + Se! 
with respect to the basis {e', e?') are (3, 5). 
When dealing with two vectors v, w there is another convenient way 


of expressing linear independence. 


Theorem 4.3. Let v, w be elements of a vector space V. They are 
linearly dependent if and only if one of them is a scalar multiple of the 
other, i.e. there is a number c 40 such that we have v = cw or w = cv. 


Proof. Left as an exercise, cf. Exercise 5. 


In the light of this theorem, the condition imposed in various 
examples in the preceding section could be formulated in terms of two 
vectors being linearly independent. 


Exercises III, 84 


1. Show that the following vectors are linearly independent. 


(a) (1, 1, 1) and (0, 1, —2) (b) (1, 0) and (1, 1) 

(c) (— 1, 1, 0) and (0, 1, 2) (d) (2, —1) and (1, 0) 

(e) (z,0) and (0, 1) (f) (1, 2) and (1, 3) 

(g) (1, 1, 0), (1, 1, 1), (h) (0, 1, 1), (0, 2, 1), 
and (0, 1, — 1) and (1, 5, 3) 


2. Express the given vector X as a linear combination of the given vectors A, B, 
and find the coordinates of X with respect to A, B. 
(a) X 2(1,0, A=(,, 1), B = (0, 1) 
(b X2(23,1, Az(Ll-1) B=(,1) 
(o) X 2(L 1), A=(2, 1), B-(—1,0) 
(d) X 2(4,3, A=(,1), B-(—1,0) 


3. Find the coordinates of the vector X with respect to the vectors A, B, C. 
(a) X =(1,0,0), A-(1, 1, 1), B-(—1,1,0, C=(1,0, —1) 
(b X =(1,1,1), A2(01,—1, B-(1,1,0) C = (1, 0, 2) 
(o) X =(0,0,1), A=(1, 1, 1), B=(-—1,1,0) C-«(10, -1) 


4. Let (a, b) and (c, d) be two vectors in R?. 
(i) If ad — bc z 0, show that they are linearly independent. 
(ii) If they are linearly independent, show that ad — bc z 0. 
(iii) If ad — bc #0 show that they form a basis of R°. 


5. (a) Let v, w be elements of a vector space. If v, w are linearly dependent, 
show that there is a number c such that w = cv, or v = cw. 
(b) Conversely, let v, w be elements of a vector space, and assume that there 
exists a number c such that w — cv. Show that v, w are linearly depen- 
dent. 
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6. Let A,,...,A, be vectors in R”, and assume that they are mutually perpendi- 
cular, in other words A; L A; if i z j. Also assume that none of them is O. 
Prove that they are linearly independent. 


7. Consider the vector space of all functions of a variable t. Show that the 
following pairs of functions are linearly independent. 
(a) 1,t (b tt (c t,t* (d) eot. (e) tee? (f) sint,cost 
(g) t, sint (h) sin t, sin 2t (1) cos t, cos 3t 


8. Consider the vector space of functions defined for t > 0. Show that the fol- 
lowing pairs of functions are linearly independent. 
(a) t, 1/t (b) e, log t 

9. What are the coordinates of the function 3 sin t + 5 cos t = f(t) with respect 


to the basis (sin t, cos tj? 


10. Let D be the derivative d/dt. Let f(t) be as in Exercise 9. What are the 
coordinates of the function Df(t) with respect to the basis of Exercise 9? 


In each of the following cases, exhibit a basis for the given space, and prove 
that it is a basis. 


11. The space of 2 x 2 matrices. 

12. The space of m x n matrices. 

13. The space of n x n matrices all of whose components are 0 except possibly 
the diagonal components. 


14. The upper triangular matrices, i.e. matrices of the following type: 


Aii 8012 ^^ Ain 
O 455 > Any 
0 0 a 


15. (a) The space of symmetric 2 x 2 matrices. 
(b) The space of symmetric 3 x 3 matrices. 
16. The space of symmetric n x n matrices. 


III, $5. Dimension 


We ask the question: Can we find three linearly independent elements in 
R?? For instance, are the elements 


A=(1,2),  B-(—5,) C=(10,4 


linearly independent? If you write down the linear equations expressing 
the relation 


xA + yB 4 zC€ — O, 
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you will find that you can solve them for x, y, z not equal to 0. Namely, 
these equations are: 
x —5y + 102 = 0, 


2x T Ty - 4z=0. 


This is a system of two homogeneous equations in three unknowns, and 
we know by Theorem 2.1 of Chapter II that we can find a non-trivial 
solution (x, y, z) not all equal to zero. Hence A, B, C are linearly depen- 
dent. 

We shall see in a moment that this is a general phenomenon. In R", 
we cannot find more than n linearly independent vectors. Furthermore, 
we shall see that any n linearly independent elements of R" must gener- 
ate R", and hence form a basis. Finally, we shall also see that if one 
basis of a vector space has n elements, and another basis has m elements, 
then m — n. In short, two bases must have the same number of elements. 
This property will allow us to define the dimension of a vector space 
as the number of elements in any basis. We now develop these ideas 
systematically. 


Theorem 5.1. Let V be a vector space, and let (v,,...,v,] generate V. 
Let w,,...,w, be elements of V and assume that n>m. Then w,,...,w 
are linearly dependent. 


n 


Proof. Since {U1,--- Um} generate V, there exist numbers (a,,) such that 
we can write 
Wi = Q4 1U,4 T PV + Gm1Um 


Wn = Aint gorem OmnÜm- 


If x,,...,X, are numbers, then 


Xi(1W4 ccc X,W, 
= (X404, He  X,04,)0, T c + (XA, oo + Xp Amn)Um 


(just add up the coefficients of v,,...,v,, vertically downward). According 
to Theorem 2.1 of Chapter II, the system of equations 


X19011 + "uv + Xn Gin = 0 
X14m1 a Beer Xn Amn m 0 


has a non-trivial solution, because n >m. In view of the preceding 
remark, such a solution (x,,...,x,) is such that 


XQW4 +- + xQW, = 0. 


as desired. 
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Theorem 5.2. Let V be a vector space and suppose that one basis has n 
elements, and another basis has m elements. Then m — n. 


Proof. We apply Theorem 5.1 to the two bases. Theorem 5.1 implies 
that both alternatives n ^ m and m > n are impossible, and hence m = n. 


Let V be a vector space having a basis consisting of n elements. We 
shall say that n is the dimension of V. If V consists of O alone, then V 
does not have a basis, and we shall say that V has dimension 0. 

We may now reformulate the definitions of a line and a plane in 
an arbitrary vector space V. A line passing through the origin is 
simply a one-dimensional subspace. A plane passing through the origin 
is simply a two-dimensional subspace. 

An arbitrary line is obtained as the translation of a one-dimensional 
subspace. An arbitrary plane is obtained as the translation of a two- 
dimensional subspace. When a basis {v,} has been selected for a one- 
dimensional space, then the points on a line are expressed in the usual 
form 


P + t,v, with all possible numbers ¢,. 


When a basis {v,, v2} has been selected for a two-dimensional space, then 
the points on a plane are expressed in the form 


P+ favo, + tav, with possible numbers t}, tz. 


Let {v,,...,v,} be a set of elements of a vector space V. Let r be a 
positive integer <n. We shall say that {v,,...,v,} is a maximal subset of 
linearly independent elements if v,,...,v, are linearly independent, and if 
in addition, given any v; with i >r, the elements v,,...,v,, v; are linearly 
dependent. 

The next theorem gives us a useful criterion to determine when a set 
of elements of a vector space is a basis. 


Theorem 5.3. Let {v,,...,v,} be a set of generators of a vector space V. 
Let {v,,...,v,$ be a maximal subset of linearly independent elements. 
Then (v,,...,v,] is a basis of V. 


Proof. We must prove that v,,...,v, generate V. We shall first prove 
that each v; (for i >r) is a linear combination of v,,...,v,. By hypothe- 
sis, given v;, there exists numbers x,,...,x,, y not all O such that 


X40, 4 ETE + X,v, + yu; = O. 
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Furthermore, y # 0, because otherwise, we would have a relation of lin- 
ear dependence for v,,...,v,. Hence we can solve for v;, namely 


X1 
vU = —- VP eer Se 
Zy 


thereby showing that v; is a linear combination of v4,...,0,. 
Next, let v be any element of V. There exist numbers c,,...,c, such 
that 


pot OD. 


In this relation, we can replace each v; (i >r) by a linear combination of 
U,,...,U,. If we do this, and then collect terms, we find that we have 
expressed v as a linear combination of v,,...,v,. This proves that 


r 


U,,...,v, generate V, and hence form a basis of V. 


We shall now give criteria which allow us to tell when elements of a 
vector space constitute a basis. 

Let v,,...,v, be linearly independent elements of a vector space V. We 
shall say that they form a maximal set of linearly independent elements of 
V if given any element w of V, the elements w, v,,...,v, are linearly 
dependent. 


Theorem 5.4. Let V be a vector space, and (v,,...,v,] a maximal set of 
linearly independent elements of V. Then {v,,...,v,} is a basis of V. 


Proof. We must now show that v,,...,v, generate V, ie. that every 
element of V can be expressed as a linear combination of v,,...,v,. Let w 
be an element of V. The elements w, v,,...,v, of V must be linearly 
dependent by hypothesis, and hence there exist numbers Xo, x,,...,x, not 
all O such that 

XoW + XV; t + XU, = O. 


We cannot have x, = 0, because if that were the case, we would obtain a 
relation of linear dependence among 1,,...,v,. Therefore we can solve 
for w in terms of v,,...,v,, namely 


This proves that w is a linear combination of v,,...,v,, and hence that 
(v,,...,v,] is a basis. 


Theorem 5.5. Let V be a vector space of dimension n, and let v,,...,v, 
be linearly independent elements of V. Then v,,...,v, constitute a basis 
of V. 
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Proof. According to Theorem 5.1. {v,,...,v,} is a maximal set of 
linearly independent elements of V. Hence it is a basis by Theorem 5.4. 


Theorem 5.6. Let V be a vector space of dimension n and let W be a 
subspace, also of dimension n. Then W = V. 


Proof. A basis for W must also be a basis for V. 


Theorem 5.7. Let V be a vector space of dimension n. Let r be a 
positive integer with r « n, and let v,,...,v, be linearly independent ele- 
ments of V. Then one can find elements v, ,,,... v, such that 


UNE D 


is a basis of V. 


Proof. Since r <n we know that {v,,...,v,} cannot form a basis of V, 
and thus cannot be a maximal set of linearly independent elements of V. 
In particular, we can find v,,, in V such that 


IR EE 


are linearly independent. If r+ 1 < n, we can repeat the argument. We 
can thus proceed stepwise (by induction) until we obtain n linearly inde- 
pendent elements {v,,...,v,}. These must be a basis by Theorem 5.4, and 
our corollary is proved. 


Theorem 5.8. Let V be a vector space having a basis consisting of n 
elements. Let W be a subspace which does not consist of O alone. Then 
W has a basis, and the dimension of W is <n. 


Proof. Let w, be a non-zero element of W. If {w,} is not a maximal 
set of linearly independent elements of W, we can find an element w, of 
W such that w,, w, are linearly independent. Proceeding in this manner, 
one element at a time, there must be an integer m € n such that we can 
find linearly independent elements w,, w5,...,w,, and such that 


LOMA 


is a maximal set of linearly independent elements of W (by Theorem 5.1 
we cannot go on indefinitely finding linearly independent elements, and 
the number of such elements is at most n). If we now use Theorem 5.4, 
we conclude that {w,,...,w,,} is a basis for W. 
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Exercises III, §5 


1. What is the dimension of the following spaces (refer to Exercises 11 through 
16 of the preceding section): 


(a) 2 x 2 matrices (b) m x n matrices 
(c) n x n matrices all of whose components are 0 expect possibly on the 
diagonal. 


(d) Upper triangular n x n matrices. 
(e) Symmetric 2 x 2 matrices. 
(f) Symmetric 3 x 3 matrices. 
(g) Symmetric n x n matrices. 


2. Let V be a subspace of R*. What are the possible dimensions for V? Show 
that if V z R?, then either V = {0}, or V is a straight line passing through the 
origin. 

3. Let V be a subspace of R^. What are the possible dimensions for V? Show 


that if V z R?, then either V = {0}, or V is a straight line passing through the 
origin, or V is a plane passing through the origin. 


III, $6. The Rank of a Matrix 
Let 


be an m x n matrix. The columns of A generate a vector space, which 1s 
a subspace of R". The dimension of that subspace is called the column 
rank of A. In light of Theorem 5.4, the column rank is equal to the 
maximum number of linearly independent columns. Similarly, the rows 
of A generate a subspace of R", and the dimension of this subspace is 
called the row rank. Again by Theorem 5.4, the row rank is equal to the 
maximum number of linearly independent rows. We shall prove below 
that these two ranks are equal to each other. We shall give two proofs. 
The first in this section depends on certain operations on the rows and 
columns of a matrix. Later we shall give a more geometric proof using 
the notion of perpendicularity. 

We define the row space of A to be the subspace generated by the 
rows of A. We define the column space of A to be the subspace gener- 
ated by the columns. 

Consider the following operations on the rows of a matrix. 


Row 1. Adding a scalar multiple of one row to another. 
Row 2. Interchanging rows. 


Row 3. Multiplying one row by a non-zero scalar. 
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These are called the row operations (sometimes, the elementary row 
operations). We have similar operations for columns, which will be 
denoted by Col 1, Col 2, Col 3 respectively. We shall study the effect of 
these operations on the ranks. 

First observe that each one of the above operations has an inverse 
operation in the sense that by performing similar operations we can 
revert to the original matrix. For instance, let us change a matrix A by 
adding c times the second row to the first. We obtain a new matrix B 
whose rows are 

B, = A, + cA), A, ..., Aq. 


If we now add —cA, to the first row of B, we get back A,. A similar 
argument can be applied to any two rows. 

If we interchange two rows, then interchange them again, we revert to 
the original matrix. 

If we multiply a row by a number c Z0, then multiplying again by 
c ! yields the original row. 


Theorem 6.1. Row and column operations do not change the row rank 
of a matrix, nor do they change the column rank. 


Proof. First we note that interchanging rows of a matrix does not 
affect the row rank since the subspace generated by the rows is the same, 
no matter in what order we take the rows. 

Next, suppose we add a scalar multiple of one row to another. We 
keep the notation before the theorem, so the new rows are 


B, = A, +A), As. A. 


Any linear combination of the rows of B, namely any linear combination 
of 
Bio ^47 ds 


is also a linear combination of A,, A45,,...,4,. Consequently the row 
space of B is contained in the row space of A. Hence by Theorem 5.6, 
we have 


row rank of B € row rank of A. 


Since A is also obtained from B by a similar operation, we get the 
reverse inequality 


row rank of A € row rank of B. 


Hence these two row ranks are equal. 
Third, if we multiply a row A; by c #0, we get the new row cA;. But 
A; — c ‘(cA;), so the row spaces of the matrix A and the new matrix 
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obtained by multiplying the row by c are the same. Hence the third 
operation also does not change the row rank. 


We could have given the above argument with any pair of rows 4A;, 
A; (i # j), so we have seen that row operations do not change the row 
rank. 


We now prove that they do not change the column rank. 


Again consider the matrix obtained by adding a scalar multiple of the 
second row to the first: 


dı + cCa21 Q1» + C5» NER Qin + Can 
B 2 421 422 SC S Aan 
Ami Am2 ST Amn 


Let B!,...,B" be the columns of this new matrix B. We shall see that 
the relation of linear dependence between the columns of B are precisely 
the same as the relations of linear dependence between the columns of A. 
In other words: 


A vector X — (x,,...,x,) gives a relation of linear dependence 
x,B! + eee + x, B” = O 


between the columns of B if and only if X gives a relation of linear 
dependence 
xA! +++» +x,A" =O 


between the columns of A. 


Proof. We know from Chapter II, §2 that a relation of linear depen- 
dence among the columns can be written in terms of the dot product 
with the rows of the matrix. So suppose we have a relation 


xB! +-+--+x,B" = 0. 
This is equivalent with the fact that 
X-B;=0 for i=1,...,m. 
Therefore 
X- (A; +cA,)=0, X-A,=0, ..., X-A,, = 0. 
The first equation can be written 


X-A,+cX-A,=0. 
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Since X - A, — 0 we conclude that X-A, — 0. Hence X is perpendicular 
to the rows of A. Hence X gives a linear relation among the columns of 
A. The converse is proved similarly. 

The above statement proves that if r among the columns of B are 
linearly independent, then r among the columns of A are also linearly 
independent, and conversely. Therefore A and B have the same column 
rank. 

We leave the verification that the other row operations do not change 
the column ranks to the reader. 

Similarly, one proves that the column operations do not change the 
row rank. The situation is symmetric between rows and columns. This 
concludes the proof of the theorem. 


Theorem 6.2. Let A be a matrix of row rank r. By a succession of row 
and column operations, the matrix can be transformed to the matrix 
having components equal to 1 on the diagonal of the first r rows and 
columns, and 0 everywhere else. 


e 
ER 
e 
(a) 
O 


0 0 1 0 0 
0 0 0 0 
7 re oe 


In particular, the row rank is equal to the column rank. 


Proof. Suppose r #0 so the matrix is not the zero matrix. Some 
component is not zero. After interchanging rows and columns, we may 
assume that this component is in the upper left-hand corner, that is this 
component is equal to a,, #0. Now we go down the first column. We 
multiply the first row by a,,/a,, and subtract it from the second row. 
We then obtain a new matrix with O in the first place of the second 
row. Next we multiply the first row by a;,/a,, and subtract it from the 
third row. Then our new matrix has first component equal to O in the 
third row. Proceeding in the same way, we can transform the matrix so 
that it is of the form 


Ai 012. 777 Qin 
O 453 > An 
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Next, we subtract appropriate multiples of the first column from the 
second, third, ..., n-th column to get zeros in the first row. This trans- 
forms the matrix to a matrix of type 


Qi 0 Pm 0 
0 a5 An 
0 Am2 Amn 


Now we have an (m — 1) x (n — 1) matrix in the lower right. If we 
perform row and column operations on all but the first row and column, 
then first we do not disturb the first component a,,; and second we can 
repeat the argument, in order to obtain a matrix of the form 


ai, 0 0 0 
0 an O 0 
0 0 444 da, 
0 0 a, 7 


Proceeding stepwise by induction we reach a matrix of the form 


a, 0 >. O0 O0 
0 ay 0 - 
0 0 Ass 
0 0 0 0 


with diagonal elements a,,,...,a,, which are #0. We divide the first row 
by a,,, the second row by a,,, etc. We then obtain a matrix 


10.00 
0 1 0 : 
0 0 1 0 
0 0 0 0 


Thus we have the unit s x s matrix in the upper left-hand corner, and 
zeros everywhere else. Since row and column operations do not change 
the row or column rank, it follows that r — s, and also that the row rank 
is equal to the column rank. This proves the theorem. 
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Since we have proved that the row rank is equal to the column rank, 
we can now omit “row” or “column” and just speak of the rank of a 
matrix. Thus by definition the rank of a matrix is equal to the dimen- 
sion of the space generated by the rows. 


Remark. Although the systematic procedure provides an effective 
method to find the rank, in practice one can usually take shortcuts to 
get as many zeros as possible by making row and column operations, so 
that at some point it becomes obvious what the rank of the matrix is. 

Of course, one can also use the simple mechanism of linear equations 
to find the rank. 


Example. Find the rank of the matrix 


2 1 1 

0 D our 
There are only two rows, so the rank is at most 2. On the other hand, 
the two columns 


are linearly independent, for if a, b are numbers such that 
2 1 0 
0) +i) =o} 


2a - b — 0, 
b — 0, 


then 


so that a — 0. Therefore the two columns are linearly independent, and 
the rank is equal to 2. 


Later we shall also see that determinants give a computation way of 
determining when vectors are linearly independent, and thus can be used 


to determine the rank. 


Example. Find the rank of the matrix. 


1 2 —3 
2 l 0 
—2 -—1 3 
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We subtract twice the first column from the second and add 3 times the 
first column to the third. This gives 


1 0 0 
2 —3 6 
—2 a 
—1 6 —5 


We add 2 times the second column to the third. This gives 


1 0 0 
2 —3 0 
—2 3 3 
—1 6 7 


This matrix is in column echelon form, and it is immediate that the first 
three rows or columns are linearly independent. Since there are only 
three columns, it follows that the rank is 3. 


Exercises III, S6 


1. Find the rank of the following matrices. 


(a) /2 1 3 (b) /—1 Z «9 
( 2 4 | 3 4 = 
(c) /1 2. 4d (d) 1 2. 23 
E 4 A x 49 3 
4 8-12 

0 0 0 

(e) /2 0 (fy /—1 0 1 
k “a 0 2 3 

0 0 7 

(g) 2 0 0 (h) 1 J its 
eus 1 2 I.e 3 

3 Ra A. i8 12 

|] =a 5 
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2. Let A be a triangular matrix 


Aii 012 c^ Gin 
O dn > Ary 
0 0 - a, 


Assume that none of the diagonal elements is equal to 0. What is the rank of 
A? 


3. Let A be an m x n matrix and let B be an n x r matrix, so we can form the 
product AB. 
(a) Show that the columns of AB are linear combinations of the columns of 
A. Thus prove that 


rank AB < rank A. 
(b) Prove that rank AB < rank B. [Hint: Use the fact that 
rank AB — rank '(AB) 
and 


rank B — rank 'B.] 


CHAPTER IV 


Linear Mappings 


We shall first define the general notion of a mapping, which generalizes 
the notion of a function. Among mappings, the linear mappings are the 
most important. A good deal of mathematics is devoted to reducing 
questions concerning arbitrary mappings to linear mappings. For one 
thing, they are interesting in themselves, and many mappings are linear. 
On the other hand, it is often possible to approximate an arbitrary 
mapping by a linear one, whose study is much easier than the study of 
the original mapping. This is done in the calculus of several variables. 


IV, 81. Mappings 


Let S, S' be two sets. A mapping from S to S’ is an association which to 
every element of S associates an element of S’. Instead of saying that F 
is a mapping from S into S’, we shall often write the symbols F: S -> S. 
A mapping will also be called a map, for the sake of brevity. 

A function is a special type of mapping, namely it is a mapping from 
a set into the set of numbers, i.e. into R. 

We extend to mappings some of the terminology we have used for 
functions. For instance, if T: S + S’ is a mapping, and if u is an element 
of S, then we denote by T(u), or Tu, the element of S’ associated to u by 
T. We call T(u) the value of T at u, or also the image of u under T. 
The symbols T(u) are read “T of u”. The set of all elements T(u), when 
u ranges over all elements of S, is called the image of T. If W is a subset 
of S, then the set of elements T(w), when w ranges over all elements of 
W, is called the image of W under T, and is denoted by T(W). 
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Let F: S —^ S' be a map from a set S into a set S'. If x is an element 
of S, we often write 
x F(x) 


with a special arrow +» to denote the image of x under F. Thus, for 

instance, we would speak of the map F such that F(x) = x? as the map 
2 

xex’. 


Example 1. For any set S we have the identity mapping I: S — S. It is 
defined by I(x) = x for all x. 


Example 2. Let S and S’ be both equal to R. Let f: R5 R be the 
function f(x) = x? (ie. the function whose value at a number x is x?). 
Then f is a mapping from R into R. Its image is the set of numbers 
= 0. 


Example 3. Let S be the set of numbers z 0, and let S =R. Let 
g:S — S' be the function such that g(x) — x!?. Then g is a mapping 
from S into R. 


Example 4. Let S be the set of functions having derivatives of all 
orders on the interval O«t«1, and let $'— S. Then the derivative 
D = d/dt is a mapping from S into S. Indeed, our map D associates the 
function df/dt = Df to the function f. According to our terminology, Df 
is the value of the mapping D at f. 


Example 5. Let S be the set R?, ie. the set of 3-tuples. Let 
A = (2,3, —1). Let L: R^ R be the mapping whose value at a vector 
X =(x,y,z) is A-X. Then L(X)= A- X. If X =(1,1, —1), then the 
value of L at X is 6. 


Just as we did with functions, we describe a mapping by giving its 
values. Thus, instead of making the statement in Example 5 describing 
the mapping L, we would also say: Let L:R?^—R be the mapping 
L(X)- A-X. This is somewhat incorrect, but is briefer, and does not 
usually give rise to confusion. More correctly, we can write Xt» L(X) or 
X — A- X with the special arrow +> to denote the effect of the map L on 
the element X. 


Example 6. Let F: R? — R? be the mapping given by 
F(x, y) = (2x, 2y). 


Describe the image under F of the points lying on the circle x? + y* = 1. 
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Let (x, y) be a point on the circle of radius 1. 
Let u = 2x and v = 2y. Then u, v satisfy the relation 


(u/2) + (v/2)? = 1 
or in other words, 


u^ v? 
gc nn 
4*4 


Hence (u,v) is a point on the circle of radius 2. Therefore the image 
under F of the circle of radius 1 is a subset of the circle of radius 2. 
Conversely, given a point (u, v) such that 


and hence is a point on the circle of radius 1. Furthermore, 


F(x, y) = (u, v). 


Hence every point on the circle of radius 2 is the image of some point 
on the circle of radius 1. We conclude finally that the image of the circle 
of radius 1 under F is precisely the circle of radius 2. 


Note. In general, let S, S' be two sets. To prove that S — S', one 
frequently proves that S is a subset of S' and that S' is a subset of S. 
This is what we did in the preceding argument. 


Example 7. This example is particularly important in geometric appli- 
cations. Let V be a vector space, and let u be a fixed element of V. We 
let 


T: VV 


be the map such that T,(v) =v -- u. We call T, the translation by u. If S 
is any subset of V, then T,(S) is called the translation of S by u, and 
consists of all vectors v + u, with ve S. We often denote it by S + u. In 
the next picture, we draw a set S and its translation by a vector u. 
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O 
Figure 1 


Example 8. Rotation counterclockwise around the origin by an angle 
0 is a mapping, which we may denote by Ry. Let 0 = 2/2. The image of 
the point (1, 0) under the rotation R,,; is the point (0,1). We may write 
this as 
R,,*(1, 0) = (0, 1). 


Example 9. Let S be a set. A mapping from S into R will be called a 
function, and the set of such functions will be called the set of functions 
defined on S. Let f, g be two functions defined on S. We can define 
their sum just as we did for functions of numbers, namely f + is the 
function whose value at an element t of S is f(t) -- g(t). We can also 
define the product of f by a number c. It is the function whose value at 
t is cf(t). Then the set of mappings from S into R is a vector space. 


Example 10. Let S be a set and let V be a vector space. Let F, G be 
two mappings from S into V. We can define their sum in the same way 
as we defined the sum of functions, namely the sum F + G is the map- 
ping whose value at an element t of S is F(t) + G(t). We also define the 
product of F by a number c to be the mapping whose value at an 
element t of S is cF(t). It is easy to verify that conditions VS 1 through 
VS 8 are satisfied. 


Exercises IV, $1 


1. In Example 4, give Df as a function of x when f is the function: 


(a) f(x) -sinx (b) f(x) =e" (c) f(x) = log x 

2. Let P = (0,1). Let R be rotation by 2/4. Give the coordinates of the image 
of P under R, i.e. give R(P). 

3. In Example 5, give L(X) when X is the vector: 
(a) (1, 2, — 3) (b) (— L 3, 0) (c) (2, l, 1) 


4. Let F: R — R? be the mapping such that F(t) = (e',t). What is F(1), F(0), 
F(—1)? 
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5. Let G: R 9 R? be the mapping such that G(t) = (t, 2t). Let F be as in Exer- 
cise 4. What is (F + G) (1), (F + G) (2), (F + G) (0)? 


6. Let F be as in Exercise 4. What is (2F) (0), (xF) (1)? 


7. Let A = (1,1, —1, 3). Let F: R +R be the mapping such that for any vector 
X = (X1,X2,X3,X4) we have F(X) 2 X-A+2. What is the value of F(X) 
when (a) X = (1, 1,0, —1) and (b) X = (2,3, — 1, 1)? 


In Exercises 8 through 12, refer to Example 6. In each case, to prove that the 
image is equal to a certain set S, you must prove that the image is contained in 
S, and also that every element of S is in the image. 


8. Let F: R5 R? be the mapping defined by F(x, y) = (2x, 3y). Describe the 
image of the points lying on the circle x? + y? = 1. 


9. Let F: R? => R? be the mapping defined by F(x, y) — (xy, y). Describe the 
image under F of the straight line x — 2. 


10. Let F be the mapping defined by F(x, y) = (e* cos y, e*sin y). Describe the 
image under F of the line x = 1. Describe more generally the image under F 
of a line x 2 c, where c is a constant. 


11. Let F be the mapping defined by F(t, u) = (cos t, sin t, u). Describe geometri- 
cally the image of the (t, u)-plane under F. 


12. Let F be the mapping defined by F(x, y) = (x/3, y/4). What is the image 
under F of the ellipse 


IV, S2. Linear Mappings 
Let V, W be two vector spaces. A linear mapping 
L: V> W 


is a mapping which satisfies the following two properties. First, for any 
elements u, v in V, and any scalar c, we have: 


LM 1. L(u + v) = L(u) + L(v). 
LM 2. L(cu) = cL(u). 


Example 1. The most important linear mapping of this course is de- 
scribed as follows. Let A be a given m x n matrix. Define 


L,:R'"—R" 
by the formula 
LAX) = AX. 
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Then L, is linear. Indeed, this is nothing but a summary way of express- 
ing the properties 


A(X + Y)=AX+AY and A(cX) = cAX 
for any vertical X, Y in R" and any number c. 


Example2. The dot product is essentially a special case of the first 
example. Let A = (a,,...,a,) be a fixed vector, and define 


LX) — A- X. 
Then L, is a linear map from R" into R, because 
A-(X + Y)=A-X+A-Y and A-(cX) = c(A- X). 


Note that the dot product can also be viewed as multiplication of ma- 
trices if we view A as a row vector, and X as a column vector. 


Example 3. Let V be any vector space. The mapping which associates 
to any element u of V this element itself is obviously a linear mapping, 
which is called the identity mapping. We denote it by J. Thus J(u) = u. 


Example 4. Let V, W be any vector spaces. The mapping which asso- 
ciates the element O in W to any element u of V is called the zero 
mapping and is obviously linear. 


Example 5. Let V be the set of functions which have derivatives of all 
orders. Then the derivative D: V > V is a linear mapping. This is simply 
a brief way of summarizing standard properties of the derivative, namely. 


D(f + g) = Df + Dg, 
D(cf) = cD(f). 


Example 6. Let V = R? be the vector space of vectors in 3-space. Let 
V’ = R? be the vector space of vectors in 2-space. We can define a 


mapping. 
F: R? > R? 


by the projection, namely F(x, y, z) = (x, y). We leave it to you to check 
that the conditions LM 1 and LM 2 are satisfied. 


More generally, suppose n =r + s is expressed as a sum of two posit- 
ive integers. We can separate the coordinates (x,,...,x,) into two 
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bunches (x,,...,X,, X, 4. p, .-,X,44,), namely the first r coordinates, and the 
last s coordinates. Let 


F: R” > R’ 
be the map such that F(x,,...,x,) = (x,,...,x,.) Then you can verify 
easily that F is linear. We call F the projection on the first r coordinates. 


Similarly, we would have a projection on the last s coordinates, by means 
of the linear map L such that 


L(X157:254X5) 8 (543455 eX 
Example 7. In the calculus of several variables, one defines the grad- 


ient of a function f to be 


TT 
Ox, OX, 


grad f(X) = P4 E 


Then for two functions f, g, we have 
grad( f + g) = grad f + grad g 
and for any number c, 
grad(cf) = c-grad f. 

Thus grad is a linear map. 

Let L:V  W be a linear mapping. Let u, v, w be elements of V. Then 

L(u + v + w) = L(u) + L(v) + L(w). 
This can be seen stepwise, using the definition of linear mappings. Thus 
Lu + v + w) = L(u + v) + L(w) = L(u) + L(v) + L(w). 


Similarly, given a sum of more than three elements, an analogous 
property is satisfied. For instance, let u,,...,u, be elements of V. Then 


L(u, +- + u,) = L(uj) + +- + L(u,). 
The sum on the right can be taken in any order. A formal proof can 
easily be given by induction, and we omit it. 


If a,,...,a, are numbers, then 


L(a,u, + +++ + a,u,) = a,L(u,) +--+: + a, L(u,). 
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We show this for three elements. 


L(a,u + av + aw) = L(a,u) + L(a;5v) + L(asw) 
= a,L(u) + a, L(v) + a4L(w). 


With the notation of summation signs, we would write 
q Y. au — Y. a; L(u;). 
i=1 i=1 


In practice, the following properties will be obviously satisfied, but it 
turns out they can be proved from the axioms of linear maps and vector 
spaces. 


LM 3. Let L: V— W be a linear map. Then L(O) = O. 
Proof. We have 
L(O) = L(O + O) = L(O) + L(O). 


Subtracting L(O) from both sides yields O = L(O), as desired. 


LM 4. Let L:V —^ W be a linear map. Then L(—v) = — L(v). 


Proof. We have 
O = L(O) = L(v — v) = L(v) + L(—v). 


Add —L(v) to both sides to get the desired assertion. 


We observe that the values of a linear map are determined by know- 
ing the values on the elements of a basis. 


Example 8. Let L: R? — R? be a linear map. Suppose that 
Ld ied and L2 —1)= (2,3). 
Find L(3, —1). 


To do this, we write (3, —1) as a linear combination of (1, 1) and 
(2, —1) Thus we have to solve 


(3, — 1) = x(1, 1) + y, — 1). 
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This amounts to solving 


x + 2y 23, 
x— y2-1 
The solution is x = 1, y = $. Hence 
1 4 —7 16 
L(3, —1) = xL(1, 1) + yL(2, — 1) = 3 (1,4) + 3 (—2,3)= E 3 


Example 9. Let V be a vector space, and let L:V—R be a linear 
map. We contend that the set S of all elements v in V such that L(v) « 0 
is convex. 

Proof. Let L(v) < 0 and L(w) «0. Let O<t<1. Then 

L(tv + (1 — tw) = tL(v) + (1 — t)L(w). 
Then tL(v) « 0 and (1 — t)L(w) « 0 so tL(v) + (1 — t)L(w) < 0, whence 
tv + (1 — t)w hes in S. If t=0 or t 1, then tv + (1 — t)w is equal to v 
or w and this also lies in S. This proves our assertion. 

For a generalization of this example, see Exercise 14. 


The coordinates of a linear map 


Let first 
F:VoR" 


be any mapping. Then each value F(v) is an element of R", and so has 
coordinates. Thus we can write 


F(v) = (F4(v),...,F,(t)), Or PEE WE 
Each F; is a function of V into R, which we write 
F;:V-R. 
Example 10. Let F: R? > R? be the mapping 
F(x, y) = 2x — y, 3x + 4y, x — 5y). 
Then 


Fi(x,y)-2x — y, F(x, y) = 3x + 4y, F(x, y) = x — 5y. 
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Observe that each coordinate function can be expressed in terms of a dot 
product. For instance, let 
A, = (2, —1), A, = (3,4), A, = (1, — 5). 
Then 
F(x, y) =A; (x, y) for i= l, 2, 3. 


Each function 
Xo A;- X 


is linear. Quite generally: 


Proposition 2.1. Let F: V — R" be a mapping of a vector space V into 
R”. Then F is linear if and only if each coordinate function F;; V — R 
is linear, for i = 1,...,n. 


Proof. For v, weV we have 


F(v + w) = (F,(v + w),...,F,(v + w)), 
F(v) = (F (v), eM ,F,(v)), 
F(w) = (Fi(w),...,F,(w)). 


Thus F(v + w) = F(v) + F(w) if and only if F,;(v- w) = F; (v) + F;(w) 
for all i=1,...,n by the definition of addition of n-tuples. The same 
argument shows that if ceR, then F(cv) = cF(v) if and only if 


F (cv) = cF {v) for all i=1,...,n. 


This proves the proposition. 


Example 10 (continued). The mapping of Example 10 is linear because 
each coordinate function is linear. Actually, if you write the vector (x, y) 
vertically, you should realize that the mapping F is in fact equal to L, 
for some matrix A. What is this matrix A? 


The vector space of linear maps 


Let V, W be two vector spaces. We consider the set of all linear map- 
pings from V into W, and denote this set by .Z(V, W), or simply .Z if the 
reference to V and W is clear. We shall define the addition of linear 
mappings and their multiplication by numbers in such a way as to make 
L into a vector space. 

Let L: V —^ W and let F: V— W be two linear mappings. We define 
their sum L + F to be the map whose value at an element u of V is 
L(u) + F(u). Thus we may write 


(L 4- F)(u) = L(u) + F(u). 
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The map L + F is then a linear map. Indeed, it is easy to verify that the 
two conditions which define a linear map are satisfied. For any elements 
u, v of V, we have 


(L + Fu + v) = L(u + v) + F(u + v) 
= L(u) + L(v) + F(u) + F(v) 
= L(u) + F(u) + L((v) + F(v) 


=(L+ F)(u) + (L + Fy(v). 


Furthermore, if c is a number, then 
(L + FYcu) = L(cu) + F(cu) 
= cL(u) + cF(u) 
= c[L(u) + F(u)] 
= c[(L + F)(u)]. 


Hence L + F is a linear map. 

If a is a number, and L: V —^ W is a linear map, we define a map aL 
from V into W by giving its value at an element u of V, namely 
(aLY(u) = aL(u). Then it is easily verified that aL is a linear map. We 
leave this as an exercise. 

We have just defined operations of addition and multiplication by 
numbers in our set Z. Furthermore, if L: V —^ W is a linear map, i.e. an 
element of Z, then we can define —L to be (— 1)L, i.e. the product of 
the number —1 by L. Finally, we have the zero-map, which to every 
element of V associates the element O of W. Then 4 is a vector space. 
In other words, the set of linear maps from V into W is itself a. vector 
space. The verification that the rules VS1 through VS8 for a vector 
space are satisfied is easy and is left to the reader. 


Example 11. Let V — W be the vector space of functions which have 
derivatives of all orders. Let D be the derivative, and let / be the iden- 
tity. If f is in V, then 


(D 4 If — Df 4 ff. 
Thus, when f(x) = e*, then (D + Df is the function whose value at x is 
e~ + e = 2e". 


If f(x) = sin x, then (D + 3/)f is the function such that 


((D + 3D f)(x) = (Df)(x) + 3If(x) 2 cos x + 3 sin x. 
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We note that 3-J is a linear map, whose value at f is 3f. Thus 


(D--3.Df-2Df--3f. At any number x, the value of (D -- 3- Df is 
Df(x) + 3f(x). We can also write (D + 3I)f = Df + 3f. 


Exercises IV, $2 


L. 


Determine which of the following mappings F are linear. 
(a) F: R? 2 R? defined by F(x, y, z) = (x, z). 

(b) F: R* —> R* defined by F(X) = —X. 

(c) F: R? 2 R? defined by F(X) = X + (0, —1, 0). 

(d) F: R? >R? defined by F(x, y) = (2x + y, y). 

(e) F: R? 2 R? defined by F(x, y) = (2x, y — x). 

(f) F: R? 2 R? defined by F(x, y) = (y, x). 

(g) F: R? >R defined by F(x, y) = xy. 


. Which of the mappings in Exercises 4, 7, 8, 9, of §1 are linear? 


. Let V, W be two vector spaces and let F: V —^ W be a linear map. Let U be 


the subset of V consisting of all elements v such that F(v) = O. Prove that U 
is a subspace of V. 


. Let L: V— W be a linear map. Prove that the image of L is a subspace of 


W. [This will be done in the next section, but try it now to give you prac- 
tice. ] 


. Let A, B be two m x n matrices. Assume that 


AX — BX 


for all n-tuples X. Show that A — B. This can also be stated in the form: If 
La = Lg then A = B. 


. Let T; V— V be the translation by a vector u. For which vectors u is T, a 


linear map? 


. Let L: V ^ W be a linear map. 


(a) If S is a line in V, show that the image L(S) is either a line in W or a 
point. 

(b) If S is a line segment in V, between the points P and Q, show that the 
image L(S) is either a point or a line segment in W. Between which 
points in W? 

(c) Let v,, v, be linearly independent elements of V. Assume that L(v,) and 
L(v,) are linearly independent in W. Let P be an element of V, and let S 
be the parallelogram 


P ctv, + fov, with O<t,;<1 for i- 1, 2. 
Show that the image L(S) is a parallelogram in W. 


(d) Let v, w be linearly independent elements of a vector space V. Let 
F:V—W be a linear map. Assume that F(v), F(w) are linearly depen- 
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dent. Show that the image under F of the parallelogram spanned by v 
and w is either a point or a line segment. 


8. Let E, = (1,0) and E, = (0, 1) as usual. Let F be a linear map from R? into 
itself such that 


F(E,)=(1, 1) and ~~ -F(E,) = (1, 2). 


Let S be the square whose corners are at (0,0), (1, 0), (1, 1), and (0, 1). Show 
that the image of this square under F is a parallelogram. 


9. Let A, B be two non-zero vectors in the plane such that there is no constant 
c #0 such that B=cA. Let L be a linear mapping of the plane into itself 
such that L(E,)- A and L(E;)- B. Describe the image under L of the 
rectangle whose corners are (0, 1), (3, 0), (0, 0), and (3, 1). 


10. Let L: R? > R? be a linear map, having the following effect on the indicated 
vectors: 
(a) L(3, 1) 4 (1, 2) and L(—1, 0) =, 1) 
(b) L(4, 1) =, 1) and L(1, 1) = (3, —2) 
(c) L(1, 1) 2 (2, 1) and L(—1, 1) = (6, 3). 
In each case compute L(1, 0). 


11. Let L be as in (a), (b), (c), of Exercise 10. Find L(0, 1). 


12. Let V, W be two vector spaces, and F: V ^ W a linear map. Let w,,...,w, be 
elements of W which are linearly independent, and let v,,...,v, be elements of 
V such that F(vj) = w; for i = 1,...,n. Show that v,,...,v, are linearly inde- 
pendent. 


13. (a) Let V be a vector space and F: V 5 R a linear map. Let W be the subset 
of V consisting of all elements v such that F(v) = O. Assume that W z V, 
and let vy be an element of V which does not lie in W. Show that every 
element of V can be written as a sum w+ cog, with some w in W and 
some number c. 

(b) Show that W is a subspace of V. Let {v,,...,v,} be a basis of W. Show 
that (vo, v,,...,v,) is a basis of V. 


Convex sets 
14. Show that the image of a convex set under a linear map is convex. 


15. Let L: V W be a linear map. Let T be a convex set in W and let S be the 
set of elements ve V such that L(v)e T. Show that S is convex. 


Remark. Why do these exercises give a more general proof of what you 
should already have worked out previously? For instance: Let A € R" and let c 
be a number. Then the set of all X eR" such that X -A 2 c is convex. Also if S 
is a convex set and c is a number, then cS is convex. How do these statements 
fit as special cases of Exercises 14 and 15? 


16. Let S be a convex set in V and let ue V. Let T,: V— V be the translation by 
u. Show that the image T,(S) is convex. 


136 LINEAR MAPPINGS [IV, $3] 


Eigenvectors and eigenvalues. Let V be a vector space, and let L: V> V be a 
linear map. An eigenvector v for L is an element of V such that there exists a 
scalar c with the property. 

L(v) = cv. 


The scalar c is called an eigenvalue of v with respect to L. If v #0 then c is 
uniquely determined. When V is a vector space whose elements are functions, 
then an eigenvector is also called an eigenfunction. 


17. (a) Let V be the space of differentiable functions on R. Let f(t) = e", where 
c is some number. Let L be the derivative d/dt. Show that f is an 
eigenfunction for L. What is the eigenvalue? 

(b) Let L be the second derivative, that is 


for any function f. Show that the functions sin t and cos t are eigenfunc- 
tions of L. What are the eigenvalues? 


18. Let L: VV be a linear map, and let W be the subset of elements of V 
consisting of all eigenvectors of L with a given eigenvalue c. Show that W is 
a subspace. 


19. Let L: V —^V be a linear map. Let v,,...,v, be non-zero eigenvectors for L, 
with eigenvalues c,,...,c, respectively. Assume that c,,...,c, are distinct. 
Prove that v,,...,v, are linearly independent. [Hint: Use induction.] 


IV, 83. The Kernel and Image of a Linear Map 


Let F: V— W be a linear map. The image of F is the set of elements w 
in W such that there exists an element v of V such that F(v) — w. 


The image of F is a subspace of W. 


Proof. Observe first that F(O) = O, and hence O is in the image. 
Next, suppose that w,, w, are in the image. Then there exist elements 
vi, v, of V such that F(v,) = w, and F(v;) = w,. Hence 


F(v, + v5) = F(v,) + F(v?) = w, + wo, 
thereby proving that w, + w, is in the image. If c is a number, then 
F(cv,) = cF(v,) = cw,. 


Hence cw, is in the image. This proves that the image is a subspace 
of W. 


Let V, W be vector spaces, and let F: V— W be a linear map. The set 
of elements ve V such that F(v) = O is called the kernel of F. 
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The kernel of F is a subspace of V. 

Proof. Since F(O) = O, we see that O is in the kernel. Let v, w be in 
the kernel. Then F(v + w) = F(v) + F(w) = O + O = O, so that v + w is 
in the kernel. If c is a number, then F(cv) = cF(v) = O so that cv is also 
in the kernel. Hence the kernel is a subspace. 

Example 1. Let L: R?  R be the map such that 

L(x, y, z) = 3x — 2y + z. 
Thus if A = (3, —2, 1), then we can write 
L(X)=X-A=A-X. 
Then the kernel of L is the set of solutions of the equation. 


3x —-2y+z=0. 


Of course, this generalizes to n-space. If A is an arbitrary vector in R", 
we can define the linear map 


L,:R'"—5R 


such that L4(X) = A- X. Its kernel can be interpreted as the set of all X 
which are perpendicular to A. 


Example 2. Let P: R? — R? be the projection, such that 
P(x, y, z) = (x, y). 


Then P is a linear map whose kernel consists of all vectors in R? whose 
first two coordinates are equal to 0, i.e. all vectors 


(0, 0, z) 


with arbitrary component z. 
Example 3. Let A be an m x n matrix, and let 
L: R” > R” 


be the linear map such that L4(X)- AX. Then the kernel of L, is 
precisely the subspace of solutions X of the linear equations 


AX = O. 
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Example 4. Differential equations. Let D be the derivative. If the real 
variable is denoted by x, then we may also write D — d/dx. The deriva- 
tive may be iterated, so the second derivative is denoted by D? (or 
(d/dx)*). When applied to a function, we write D?f, so that 


d? 
(Df (x) = £4. 


dx? 


Similarly for D?, D*,...,D" for the n-th derivative. 

Now let V be the vector space of functions which admit derivatives of 
all orders. Let a,,...,a,, be numbers, and let g be an element of V, that is 
an infinitely differentiable function. Consider the problem of finding a 
solution f to the differential equation 


d"f d"! 
———— = ——————— eee + a LI E 
Am dx" T Am 1 dx"! T if g 


We may rewrite this equation without the variable x, in the form 
a, D"f + a, ,D" !f +--+ af — g. 

Each derivative D* is a linear map from V to itself. Let 
L-a,D"-ra, ,D" ! +--+ a,l. 


Then L is a sum of linear maps, and is itself a linear map. Thus the 
differential equation may be rewritten in the form 


L(f) = g. 


This is now in a similar notation to that used for solving linear equa- 
tions. Furthermore, this equation is in “non-homogeneous” form. The 
associated homogeneous equation is the equation 


L(f) = 9, 


where the right-hand side is the zero function. Let W be the kernel of L. 
Then W is the set (space) of solutions of the homogeneous equation 


a, D"f +---+a,f=0. 


If there exists one solution fọ for the non-homogeneous equation 
L(f) =g, then all solutions are obtained by the translation 


fo + W= set of all functions fo + f with f in W. 


See Exercise 5. 
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In several previous exercises we looked at the image of lines, planes, 
parallelograms under a linear map. For example, if we consider the 
plane spanned by two linearly independent vectors v,, v; in V, and 


L:V—9W 


is a linear map, then the image of that plane will be a plane provided 
L(v,) L(v,) are also linearly independent. We can give a criterion for 
this in terms of the kernel, and the criterion is valid quite generally as 
follows. 


Theorem 3.1. Let F:V—W be a linear map whose kernel is {O}. 
If v,,...,v, are linearly independent elements of V, then F(v,),...,F(v,) 
are linearly independent elements of W. 


Proof. Let x,,...,x, be numbers such that 


x,F(v,) +: + x,F(v,) = O. 
By linearity, we get 


F(x4U4 +++: + X,U,) = O. 


Hence x,v, +- + x,v, =O. Since v,,...,v, are linearly independent it 
follows that x; = 0 for i = 1,...,n. This proves our theorem. 


We often abbreviate kernel and image by writing Ker and Im respec- 
tively. The next theorem relates the dimensions of the kernel and image 
of a linear map, with the dimension of the space on which the map is 
defined. 


Theorem 3.2 Let V be a vector space. Let L: V —^W be a linear map of 
V into another space W. Let n be the dimension of V, q the dimension 
of the kernel of L, and s the dimension of the image of L. Then 
n-—q-s. In other words, 


dim V = dim Ker L + dim Im L. 


Proof. If the image of L consists of O only, then our assertion is 
trivial We may therefore assume that s — 0. Let {w,,...,w,} be a basis 
of the image of L. Let v,,...,v, be elements of V such that L(vj) = w; for 
i=1,...,s. If the kernel is not {O}, let {u,,...,u,} be a basis of the 
kernel. If the kernel is {O}, it is understood that all reference to 
{u,,...,U,} is to be omitted in what follows. We contend that 


{015 .++50s, Uy,--- Uy} 
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is a basis of V. This will suffice to prove our assertion. Let v be any 
element of V. Then there exist numbers x,,...,x, such that 


L(v) = X4W, + +++ + X, Wss 
because {w,,...,w,} is a basis of the image of L. By linearity, 


L(v) = L(x4v, + EE + X,U,), 


and again by linearity, subtracting the right-hand side from the left-hand 
side, it follows that 


L(v — xv, —::: — X,v,) = O. 
Hence v —x,v, —---— x,v, lies in the kernel of L, and there exist 
numbers y,,...,y, such that 
V— X40, — ttt — XU = yy Hee + Yau 


Hence 
V = X4U, t c + XQU + yu tees Yala 


is a linear combination of 1,,...,v,, u,,...,u4, This proves that these 
S + q elements of V generate V. 

We now show that they are linearly independent, and hence that they 
constitute a basis. Suppose that there exists a linear relation: 


X1U4 + DAN + XV; + Vy, + us T yu. = O. 


Applying L to this relation, and using the fact that L(u,;)=O for 
j=1,...,q, we obtain 


x,L(v,) +- + x,L(v,) = O. 


But L(v,),...,L(v,) are none other than w,,...,w,, which have been as- 
sumed linearly independent. Hence x; = 0 for i= 1,...,s. Hence 


yiU, ++: + yu, = O. 


But u,,...,u, constitute a basis of the kernel of L, and in particular, are 
linearly independent. Hence all y; = 0 for j = 1,...,q. This concludes the 
proof of our assertion. 


Example 1 (continued). The linear map L:R?^ —» R of Example 1 is 
given by the formula 


L(x, y, z) = 3x — 2y +z. 
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Its kernel consists of all solutions of the equation 
3x —y+z=0. 


Its image is a subspace of R, is not {O}, and hence consists of all of R. 
Thus its image has dimension 1. Hence its kernel has dimension 2. 


Example 2 (continued). The image of the projection 
P: R? > R? 


in Example 2 is all of R?, and the kernel has dimension 1. 


Exercises IV, §3 


Let L: V— W be a linear map. 


1. (a) If S is a one-dimensional subspace of V, show that the image L(S) is 
either a point or a line. 
(b) If S is a two-dimensional subspace of V, show that the image L(S) is 
either a plane, a line or a point. 


2. (a) If S is an arbitrary line in V (cf. Chapter III, $2) show that the image of 
S is either a point or a line. 
(b) If S is an arbitrary plane in V, show that the image of S is either a plane, 
a line or a point. 


3. (a) Let F: V9 W be a linear map, whose kernel is {O}. Assume that V and 
W have both the same dimension n. Show that the image of F is all of 
W. 
(b) Let F: V5 W be a linear map and assume that the image of F is all of 
W. Assume that V and W have the same dimension n. Show that the 
kernel of F is {0}. 


4. Let L:V— W be a linear map. Assume dim V > dim W. Show that the 
kernel of L is not O. 


5. Let L:V— W be a linear map. Let w be an element of W. Let v, be an 
element of V such that L(vg) = w. Show that any solution of the equation 
L(X) = w is of type vg + u, where u is an element of the kernel of L. 


6. Let V be the vector space of functions which have derivatives of all orders, 
and let D: V —^ V be the derivative. What is the kernel of D? 


7. Let D? be the second derivative (i.e. the iteration of D taken twice). What is 
the kernel of D?? In general, what is the kernel of D" (n-th derivative)? 


8. (a) Let V, D be as in Exercise 6. Let L= D — I, where I is the identity 
mapping of V. What is the kernel of L? 
(b) Same question of L = D — al, where a is a number. 
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9. 


10. 


11. 


12. 


(a) What is the dimension of the subspace of R" consisting of those vectors 
A = (a,,...,a,) such that a, +---+a,=0? 

(b) What is the dimension of the subspace of the space of n x n matrices (aj) 
such that 


aii te a, = Ya; = 0? 
An n x n matrix A is called skew-symmetric if 'A = — A. Show that any 
n x n matrix A can be written as a sum 
A=B+C, 
where B is symmetric and C is skew-symmetric. [Hint: Let B = (A + ' Ay2.] 


Show that if A = B, + C,, where B, is symmetric and C, is skew-symmetric, 
then B= B, and C = C.. 


Let M be the space of all n x n matrices. Let 
P:M>M 
be the map such that 
P(A) = É 5 = 


(a) Show that P is linear. 

(b) Show that the kernel of P consists of the space of skew-symmetric ma- 
trices. 

(c) Show that the image of P consists of all symmetric matrices. [Watch out. 
You have to prove two things: For any matrix A, P(A) is symmetric. 
Conversely, given a symmetric matrix B, there exists a matrix A such 
that B = P(A). What is the simplest possibility for such A?] 

(d) You should have determined the dimension of the space of symmetric 
matrices previously, and found n(n + 1)/2. What then is the dimension of 
the space of skew-symmetric matrices? 

(e) Exhibit a basis for the space of skew-symmetric matrices. 


Let M be the space of all n x n matrices. Let 
Q:M^5M 
be the map such that 
oa - 4-4 


(a) Show that Q is linear. 
(b) Describe the kernel of Q, and determine its dimension. 
(c) What is the image of Q? 
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13. A function (real valued, of a real variable) is called even if f(—x) = f(x). It 
is called odd if f(—x) = —f(x). 
(a) Verify that sin x is an odd function, and cos x is an even function. 
(b) Let V be the vector space of all functions. Define the map 


P:VoV 


by (PfYx) = (f(x) + f(—x))/2. Show that P is a linear map. 
(c) What is the kernel of P? 
(d) What is the image of P? Prove your assertions. 
14. Let again V be the vector space of all functions. Define the map 


Q:VoV 


by (Qf (x) = f(x) — f(—x))/2. 

(a) Show that Q is a linear map. 

(b) What is the kernel of Q? 

(c) What is the image of Q? Prove your assertion. 


Remark. Exercises 11, 12, 13, 14 have certain formal elements in common. 
These common features will be discussed later. See Exercises 4 through 7 of 
Chapter V, $1. 


15. The product space. Let U, W be vector spaces. We let the direct product, 
simply called the product, U x W be the set of all pairs (u, w) with u eU and 
we W. This should not be confused with the product of numbers, the scalar 
product, the cross product of vectors which is sometimes used in physics to 
denote a different type of operation. It is an unfortunate historical fact that 
the word product is used in two different contexts, and you should get ac- 
customed to this. For instance, we can view R^ as a product, 


R^-R^xR!-R^xR 


by viewing a 4-tuple (x1, X2, X3, X4) as putting side by side the triple 
(Xi, X2, x4) and the single number x4. Similarly, 


R* = R? x R?, 


by viewing (x1, X2, X3, X4) as putting side by side (x,, x5) and (x3, x4). 
If (u,, w,) and (u;, w;) are elements of U x W, so 


Uj, u9€U and Wi, Ww, € W, 
we define their sum componentwise, that is we define 
(ui, Wy) + (u5, w5) = (uy + u5, Wy + w2). 


If c is a number, define c(u, w) = (cu, cw). 
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(a) Show that U x W is a vector space with these definitions. What is the 
zero element? 

(b) Show that dim(U x W) = dim U + dim W. In fact, let (uj) (i = 1,...,n) 
be a basis of U and (wj) (j =1,...,m) be a basis of W. Show that the 
elements {(u;, 0)) and ((0, w,)} form a basis of U x W. 

(c) Let U be a subspace of a vector space V. Show that the subset of Vx V 
consisting of all elements (u, u) with ue U is a subspace of V x V. 

(d) Let (uj) be a basis of U. Show that the set of elements (uj, uj) is a basis 
of the subspace in (c). Hence the dimension of this subspace is the same 
as the dimension of U. 


16. (To be done after you have done Exercise 15.) Let U, W be subspaces of a 
vector space V. Show by the indicated method that 


dim U + dim W= dim(U + W) + dim(U ^ W). 


(a) Show that the map 
L:U x W>V 
given by 
L(u w) =u—w 
is a linear map. 
(b) Show that image of L is U + W. 
(c) Show that the kernel of L is the subspace of U x W consisting of all 
elements (u, u) where u is in Uœ W. What is a basis for this subspace? 


What is its dimension? 
(d) Apply the dimension formula in the text to conclude the proof. 


IV, §4. The Rank and Linear Equations Again 


Let A be an m x n matrix, 

a ip. d d in 

G d dd 
Let L,: R” > R” be the linear map which has been defined previously, 
namely 


L,(X) = AX. 


As we have mentioned, the kernel of L, 1s the space of solutions of the 
system of linear equations written briefly as 


AX = O. 


Let us now analyze its image. 
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Let El,...,E" be the standard unit vectors of R", written as column 
vectors, so 


1 0 
gt] 2 ef’ 
0 1 


Then ordinary matrix multiplication shows that 
AE! = A! 
is the j-th column of A. Consequently for any vector 
X = xE! + + xE", 


we find that 
AX = L (X) = x, Al +- + x, A". 
Thus we see: 


Theorem 4.1. The image of L, is the subspace generated by the 
columns of A. 


In Chapter III, we gave a name to the dimension of that space, 
namely the column rank, which we have already seen is equal to the row 
rank, and is simply called the rank of A. Now we can interpret this rank 
also in the following way: 


The rank of A is the dimension of the image of L}. 


Theorem 4.2. Let r be the rank of A. Then the dimension of the space 
of solutions of AX — O is equal to n — r. 


Proof. By Theorem 3.2 we have 
dim Im L, + dim Ker L, = 


But dim Im L, =r and Ker L, is the space of solutions of the homo- 
geneous linear equations, so our assertion is now clear. 


Example 1. Find the dimension of the space of solutions of the system 
of equations 
2x— y+ z+2w=0, 


x+y—2z— w=0. 
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Here the matrix A is 


2 —1 1 2 
1 1 —2 -—1/ 


It has rank 2 because the two vectors 


are easily seen to be linearly independent. [Either use row and column 
operations, or do this by linear equations.] Hence the dimension of the 
space of solutions is 4 — 2 = 2. 


We recall that the system of linear equations could also be written in 
the form 
X-A; 20 for i=1,...,m, 


where A; are the rows of the matrix A. This means that X is perpendi- 
cular to each row of A. Then X is also perpendicular to the row space 
of A, ie. to the space generated by the rows. It is now convenient to 
introduce some terminology. 

Let U be a subspace of R". We let 


U+ = set of all elements X in R” such that X - Y = 0 for all Y in U. 


We call U+ the orthogonal complement of U. It is the set of vectors 
which are perpendicular to all elements of U, or as we shall also say, 
perpendicular to U itself. Then it is easily verified that U+ is a subspace 
(Exercise 8). 

Let U be the subspace generated by the row vectors of the matrix 
A — (aj). Then its orthogonal complement U~ is precisely the set of 
solutions of the homogeneous equations 


X-A;=0 for all i. 


In other words, we have 


(row space of A). = Ker L, = space of solutions of AX = O. 


Theorem 4.3. Let U be a subspace of R". Then 


dim U + dim Ut =n. 
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Proof. Let r = dim U. If r = 0, then the assertion is obvious. If r #0 
then U has a basis, and in particular is generated by a finite number of 
vectors A,,...,A,, which may be viewed as the rows of a matrix. Then 
the dimension formula is a special case of Theorem 4.2. 


In 3-dimensional space, for instance, Theorem 4.3 proves the fact that 
the orthogonal complement of a line is a plane, and vice versa, as shown 
on the figure. 


Figure 2 


In 4-space, the orthogonal complement of a subspace of dimension 1 
has dimension 3. The orthogonal complement of a subspace of dimen- 
sion 2 has also dimension 2. 


Let us now discuss briefly non-homogeneous equations, ie. a system 
of the form 


AX = B, 
where B is a given vector (m-tuple). Such a system may not have 
a solution, in other words, the equations may be what is called 
“inconsistent”. 


Example 2. Consider the system 


3x — y+ zl, 
2x+ y— z=2, 
x —2y+2z=5. 


It turns out that the third row of the matrix of coefficients 


3 =l 1 
A={2 1 —1 
1 =2 2 
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is obtained by subtracting the second row from the first. Hence it 
follows at once that the rank of the matrix is 2. On the other 
hand, 5 # 1 — 2 so there cannot be a solution to the above system of 
equations. 

Theorem 4.4. Consider a non-homogeneous system of linear equations 


AX = B. 


Suppose that there exists at least one solution Xo. Then the set of 
solutions is precisely 


Xo + Ker Ly. 
In other words, all the solutions are of the form 
Xo + Y, where Y is a solution of AY =O. 
Proof. Let YeKer L,. This means AY = O. Then 
A(Xo + Y)= AX + AY—- B - O — B. 


so X, + Ker L, is contained in the set of solutions. Conversely, let X 
be any solution of AX = B. Then 


A(X — X) = AX — AX, =B-—B=O. 


Hence X = Xo +(X — Xo) where X— X =Y and AY=O. This 
proves the theorem. 


When there exists one solution at least to the system AX = B, then 
dim Ker L, is called the dimension of the set of solutions. It is the 
dimension of the homogeneous system. 


Example 3. Find the dimension of the set of solutions of the following 
system of equations, and determine this set in R?. 


2x+y+z=l1, 
y—z=0. 


We see by inspection that there is at least one solution, namely 
x =4, y=2z=0. The rank of the matrix 
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is 2. Hence the dimension of the set of solutions is 1. The vector space 
of solutions of the homogeneous system has dimension 1, and one 
solution is easily found to be 


pue cT x= —l. 


Hence the set of solutions of the inhomogeneous system is the set of all 


vectors 
(4, 0, 0) zu t(— 1, l, 1), 


where t ranges over all real numbers. We see that our set of solutions is 
a straight line. 


Exercises IV, §4 


1. Let A be a non-zero vector in R". What is the dimension of the space of 
solutions of the equation A- X =0? 

2. What is the dimension of the subspace of R° perpendicular to the two vectors 
(1, 1, —2, 3, 4, 5) and (0,0, 1, 1,0, 7)? 


3. Let A be a non-zero vector in n-space. Let P be a point in n-space. What is 
the dimension of the set of solutions of the equation 
X-A=P-A? 


4. What is the dimension of the space of solutions of the following systems of 
linear equations? In each case, find a basis for the space of solutions. 


(a) 2x+ y—z=0 (b x -—-y+z=0 

2x+y+z=0 
(c) 4x + 7y ^ nz 20 (d) x - y -zz0 
2x — y+ z=0 x-—y = 0 
y+z=0 


5. What is the dimension of the space of solutions of the following systems of 
linear equations? 
(a) 2x — y -z 20 (b) 2x t 7y =0 


x+ y-zzo0 x—2y+ z=0 
(c) 2x —- 3y - z=0 (d) x+ y+ z=0 
x+ y— z=0 2x +2y+2z=0 

3x - 4y 20 


5x t yt z=0 
6. Let L:V—^ W be a linear map. Using a theorem from the text, prove that 
dim Im L < dim V. 


7. Let A, B be two matrices which can be multiplied, 1e. such that AB exists. 
Prove that 


rank AB € rank A and rank AB < rank B. 


8. Let U be a subspace of R”. Prove that U+ is also a subspace. 
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IV, 85. The Matrix Associated with a Linear Map 


To every matrix A we have associated a linear map L,. Conversely, 
given a linear map 


L: R” > R”, 


we shall now prove that there is some associated matrix A such that 
LE M 

Let Et,...,E" be the unit column vectors of R". For each j = 1,...,n 
let L(E^) = A), where A’ is a column vector in R”. Thus 


aii Ain 
peel : \=41..., æl: peg 


Then for every element X in R” we can write 


Xı 
X =x E! +e + xE" = 


and therefore 


L(X) = x,L(E') + --- + x,L(E") 
= xA! -+ cee + x, A" 
= AX 


where A is the matrix whose column vectors are A’,...,A". Hence L = 
L,, which proves the theorem. 


Remark. When dealing with R" and R", we are able to write column 
vectors, so the matrix A was easily derived above. Later in this section 
we deal with more general vector spaces, in terms of bases, and we shall 
write coordinate vectors horizontally. This will give rise to a transpose, 
due to the horizontal notation. 


The matrix A above will be called the matrix associated with the 
linear map L. 
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As we had seen in studying the column space of A, we can express the 
columns of A in terms of the images of the unit vectors: 


041 Ain 
(*) L(E)-|:]p-.. L(E") = 


Am1 Amn 


Example 1. Let F:R?— R? be the projection, in other words the 
mapping such that F'(x,,x;,,x4) = (x,,x,) Then the matrix associated 


with F is 
10 0 
0 1 0/ 


Example 2. Let I: R" > R" be the identity. Then the matrix associated 
with J is the matrix 


1 0 0 0 
0 1 0 0 
0 0 0 1 


having components equal to 1 on the diagonal, and 0 otherwise. 


Example 3. Let L: R^  R? be the linear map such that 


2 3 —5 1 
Lee) - (1) Lee) -( 1) LE - ( 4 LES - (5). 


According to the relations (x), we see that the matrix associated with L 


is the matrix 
2 3 —5 1 
1 —1 4 TL 


Remark. If instead of column vectors we used row vectors, then to 
find the associated matrix would give rise to a transpose. 


Let V be an n-dimensional vector space. If we pick some basis 
(v,,...,0,) of V, then every element of V can be written in terms of 
coordinates 


v= X10; +. + XnUn- 
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Thus to each element v of V we can associate the coordinate vector 


X1 


If 
W = y0; ree + Yan 


so Y is the coordinate vector of w, then 
v + w= (X, yit tees + Od yu, 
so X + Y is the coordinate vector of v + w. Let c be a number. Then 
CX = CXQU; t: c CX,0v,, 


so cX 1s the coordinate vector of cv. Thus after choosing a basis, we can 
identify V with R" via the coordinate vectors. 

Let L:V—V be a linear map. Then after choosing a basis which 
gives us an identification of V with R", we can then represent L by a 
matrix. Different choices of bases will give rise to different associated 
matrices. Some choices of bases will often give rise to especially simple 
matrices. 


Example. Suppose that there exists a basis {v,,...,v,} and numbers 
C4,...,C, Such that 


Lv; = ¢;0; for a sos. 


Then with respect to this basis, the matrix of L is the diagonal matrix 


c, O a O 
0 c - 0 
0 0 -- c, 


If we picked another basis, the matrix of L might not be so simple. 


The general principle for finding the associated matrix of a linear map 
with respect to a basis can be found as follows. 
Let {v,,...,v,} be the given basis of V. Then there exist numbers cj; 
such that 
Lv, = c4 T: F QU, 


Lv, = C40, oo + Cnntn. 
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What is the effect of L on the coordinate vector X of an element ve V? 
Such an element is of the form 


Then 


Hence we find: 


If C = (cj) is the matrix such that L(v;) = 25-, cijv;, and X is the 
coordinate vector of v, then the coordinate vector of Lv is 'CX. In 


other words, on coordinate vectors, L is represented by the matrix 'C 
(transpose of C). 


We note the transpose of C rather than C itself. This is because when 
writing Lv; as linear combination of v,,...,v, we have written it horizon- 
tally, whereas before we wrote it vertically in terms of the vertical unit 
vectors El,...,E". We call 'C the matrix associated with L with respect 
to the given basis. 


Example. Let L: V V be a linear map. Let {v,, v2, V3} be a basis of 
V such that 
L(v,) = 20, — v2, 
L(v,) = v, + v4 — 403, 
L(v3) = 5v, + 4v; + 203. 


Then the matrix associated with L on coordinate vectors is the matrix 
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It is the transpose of the matrix 


2 =] 0 
1 1 —4 
5 4 2 


Appendix: Change of Bases 


You may also ask how the matrix representing a linear map changes 
when we change a basis of V. We can easily find the answer as follows. 
First we discuss how coordinates change. 

Let {v,,...,v,} be one basis, and {w,,...,w,} another basis of V. 


Let X denote the coordinates of a vector with respect to (v,,...,v,] and 
let Y denote the coordinates of the same vector with respect to 


{Wis -Wn 


How do X and Y differ? We shall now give the answer. Let v be the 
element of V having coordinates X, viewed as a column vector. Thus 


U = XU, + More + X5 Un- 


This looks like a dot product, and it will be convenient to use the 
notation 


Ui Vy 
pecwIpeg90,531I- h 


where 'X is now a row n-tuple. Similarly, 


Wi Wi 


v='Y : EA 


W W 


n n 


We can express each w, as a linear combination of the basis elements 
U,,...,U, SO there exists a matrix B = (b;j) such that for each i, 
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But we can write these relations more efficiently in matrix form 


Wi by, ce b, 


U, U, 


Wa b, Hadr Ban Un Un 


Therefore the relation for v in terms of bases elements can be written in 
the form 
W, v1 v4 


v='Y| : | -'YB|: and also v ='X 


3 W Un Un 


Therefore we must have 'YB ='X. Taking the transpose gives us the 
desired relation 


X ='BY. 


Again notice the transpose. The change of coordinates from one basis to 
another are given in terms of the transpose of the matrix expressing each 
w; as a linear combination of v,,...,v,. 


Remark. The matrix B is invertible. 

Proof. There are several ways of seeing this. For instance, by the 
same arguments we have given, going from one basis to another, there is 
a matrix C such that 

Y-'cx. 
This is true for all coordinate n-tuples X and Y. Thus we obtain 


X ='B'CX = (CB)X 


for all n-tuples X. Hence CB = [I is the identity matrix. Similarly 
BC - I, so B is invertible. 


Now let L: V> V be a linear map. Let M be the matrix representing 
L with respect to the basis {v,,...,v,} and let M' be the matrix represent- 
ing L with respect to the basis {w,,...,w,}. By definition, 


M X with respect to {v,,...,v,}, 


the dinates of L T 
coordinates of L(v) ar peu respect to iw; .. . wj. 


By what we have just seen, we must have 


MX ='BM’Y. 
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Substitute X —'BY and multiply both sides on the left by 'B^!. Then 
we find 


'B ! M'BY = M'Y. 


This is true for all Y. If we let N —'B then we obtain the matrix M’ in 
terms of M, namely 


M'—-N'!MN where N ='B. 


Thus the matrix representing the linear map changes by a similarity trans- 
formation. We may also say that M, M' are similar. In general, two 
matrices M, M' are called similar if there exists an invertible matrix N 
such that M'—N !MN. 

In practice, one should not pick bases too quickly. For many prob- 
lems one should select a basis for which the matrix representing the 
linear map is simplest, and work with that basis. 


Example. Suppose that with respect to some basis the matrix M re- 
presenting L is diagonal, say 


2 0 
M= : 
Then the matrix representing L with respect to another basis will be of 


the form 
NMN, 


which may look like a horrible mess. Changing N arbitrarily corre- 
sponds to picking an arbitrary basıs (of course, N must be invertible). 
When we study eigenvectors later, we shall find conditions under which a 
matrix representing a linear map is diagonal with respect to a suitable 
choice of basis. 


Exercises IV, §5 


1. Find the matrix associated with the following linear maps. 
(a) F: R* > R? given by F'(x,, X2, X3, x4) = '(x1, X2) (the projection). 
(b) The projection from R* to R?. 
(c) F: R? 2 R? given by F'(x, y) = '(3x, 3y). 
(d) F: R” 9 R” given by F(X) = 7X. 
(e) F: R” 5 R” given by F(X)= —X. 
(f) F: R* 2 R* given by F'(x,, X, X3, x4) = (x4, x5, 0, 0). 


2. Let c be a number, and let L: R” 5 R” be the linear map such that L(X) = cX. 
What is the matrix associated with this linear map? 
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3. Let F: R? 2 R? be the indicated linear map. What is the associated matrix of 
F? 


1 —4 3 
(a) FED =(_ Jj RE» =( J F(E?) -(1) 


X1 


(b) FI x; «| 


3x, ara 2x; + «d 
X3 


4x, — x, + 5x4 


4. Let V be a 3-dimensional space with basis {v,,v,,v3}. Let F:V—V be the 
linear map as indicated. Find the matrix of F with respect to the given basis. 
(a) F(v,) = 3v; — v3, 

F(v,) = v, — 20; + v5, 

F(v4) = —2v, + dv; + 5v,. 
(b) F(v,) = 3v,, F(v;) = —7v;, F(v3) = 5v3. 
(c) F(v,) = —2v, + mw, 

F(v;) = —03, 

F(v3) = 0,. 


5. In the text, we gave a description of a matrix associated with a linear map of 
a vector space into itself, with respect to a basis. More generally, let V, W be 
two vector spaces. Let {v,,...,v,} be a basis of V, and {w,,...,w,,} a basis of 
W. Let L:V—^W be a linear map. Describe how you would associate a 
matrix with this linear map, giving the effect on coordinate vectors. 


6. Let L: V — V be a linear map. Let ve V. We say that v is an eigenvector for 
L if there exists a number c such that L(v) 2 cv. Suppose that V has a basis 
(v,,...,0,) consisting of eigenvectors, with L(v;) = c;v; for i = 1,...,n. What is 
the matrix representing L with respect to this basis? 


7. Let V be the vector space generated by the two functions fj(t) — cost and 
f; (t) = sint. Let D be the derivative. What is the matrix of D with respect to 
the basis (fi, f2}? 


8. Let V be the vector space generated by the three functions fit) — 1, f(t) — t, 
f;(t) =t?. Let D:V 5 V be the derivative. What is the matrix of D with 
respect to the basis (fi, f;, f3}? 


CHAPTER V 


Composition and Inverse 
Mappings 


V, $1. Composition of Linear Maps 


Let U, V, W be sets. Let 
F:U>V and G: V> W 


be mappings. Then we can form the composite mapping from U into W, 
denoted by Go F. It is by definition the mapping such that 


(Go FXu) = G(F(u)) for all u in U. 


Example 1. Let A be an m x n matrix, and let B be a q x m matrix. 
Then we may form the product BA. Let 


La: R” — R” be the linear map such that L,(X)= AX 
and let 

Lg: R” >R? be the linear map such that L,(Y) = BY. 
Then we may form the composite linear map Lge L4 such that 


(Lge LAKX) = L(L4(X)) = Lg(AX) = BAX. 


Thus we have 
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We see that composition of linear maps corresponds to multiplication of 
matrices. 


Example 2. Let A be an m x n matrix, and let 
Lı: R” > R” 


be the usual linear map such that L4(X) = AX. Let C be a vector in R” 
and let 
To: R” > R” 


be the translation by C, that is T«(Y)= Y+ C. Then the composite 
mapping T.° L, is obtained by first applying L, to a vector X, and then 
translating by C. Thus 


T- ° LX) = TKL4(O) = TAX) = AX + C. 


Example3. Let V be a vector space, and let w be an element of V. 
Let 
T: VV 


be the translation by w, that is the map such that T,(v) — v + w. Then 
we have 


T, (L,,(v)) a T, (v T w2) =V 25 W2 + Wi. 


Thus 
= Ty, ws: 
We can express this by saying that the composite of two translations is 
again a translation. Of course, the translation T, is not a linear map if 
w Æ O because 

T,(O) =O+w=w #0, 


and we know that a linear map has to send O on O. 
Example 4. Rotations. Let 0 be a number, and let A(0) be the matrix 


«5 = CT ps 


sin 0 cos 0 
Then A(0) represents a rotation which we may denote by R,. The 
composite rotation Reg,» Reg, is obtained from the multiplication of 


matrices, and for any vector X in R? we have 


Re, : Re, (X) €: A(0,)4(05)X. 
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This composite rotation is just rotation by the sum of the angles, namely 
0, + 05. This corresponds to the formula A(0,)4(05,) = A(0, + 0,). 


The following statement is an important property of mappings. 
Let U, V, W, S be sets. Let 


F:U2K G: V> W, and H:w-S 


be mappings. Then 
H 0 (G° F) = (H ° G)° F. 


Proof. Here again, the proof is very simple. By definition, we have, 
for any element u of U: 


(H ° (G ° F))(u) = H((Ge F)X(u)) = H(G(F(u))). 
On the other hand, 
((H » G)» F)(u) = (H ° G)X(F(u)) = H(G(F(u))). 
By definition, this means that (H» G)o F = H0 (G° F). 
Theorem 1.1. Let U, V, W be vector spaces. Let 
F:U>V and G: V> W 
be linear maps. Then the composite map Go F is also a linear map. 


Proof. This is very easy to prove. Let u, v be elements of U. Since F 
is linear, we have F(u + v) = F(u) + F(v). Hence 


(G » F)(u + v) = G(F(u + v)) = G(F(u) + F(v)). 
Since G is linear, we obtain 


G(F(u) + F(v)) = G(F(u)) + G(F(v)). 
Hence 


(G » F)(u + v) = (G» F)(u) + (Go Fv). 
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Next, let c be a number. Then 


(Go FYcu) = G(F(cu)) 
= G(cF(u)) (because F is linear) 
— cG(F(u)) (because G is linear). 


This proves that GoF is a linear mapping. 


The next theorem states that some of the rules of arithmetic concern- 
ing the product and sum of numbers also apply to the composition and 
sum of linear mappings. 


Theorem 1.2. Let U, V, W be vector spaces. Let 
F:UoV 


be a linear mapping, and let G, H be two linear mappings of V into W. 
Then 


(G+ H)eF — Ge F + HoF. 


If c is a number, then 


(cG)» F = c(G » F). 
If T: U —^V is a linear mapping from U into V, then 
Go(F + T)=GoF+4+GoT. 


The proofs are all simple. We shall just prove the first assertion and 
leave the others as exercises. 
Let u be an element of U. We have: 


((G + H)» Fu) = (G + H(FQ)) = G(F(u)) + H(F(u)) 
= (Go FYu) + (He Fu). 


By definition, it follows that (G+ H)o F =GoF+HoF. 

As with matrices, we see that composition and addition of linear maps 
behaves like multiplication and addition of numbers. However, the same 
warning as with matrices applies here. First, we may not have commutativ- 
ity, and second we do not have “division”, except as discussed in the next 
section for inverses, when they exist. 


Example 5. Let 
F: R? > R? 
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be the linear map given by 
F(x, y, z) = (x, y, 0) 
and let G be the linear mapping given by 
G(x, y, z) = (x, z, 0) 
Then (Go F)(x, y, z) = (x, 0,0), but (Fe G)(x, y, z) = (x, z, 0). 


On the other hand, let 
L:V—9V 


be a linear map of a vector space into itself. We may iterate L several 
times, so as usual we let 


L? = LoL, L? =LoLoL, and so forth. 


We also let 
L? = J = identity mapping. 


Thus L* is the iteration of L with itself k times. For such powers of L, 
we do have commutativity, namely 


Uts = Do [S = LL. 


Exercises V, §1 


1. Let A, B be two m x n matrices. Assume that 
AX = BX 


for all n-tuples X. Show that A = B. 


2. Let F, L be linear maps of V into itself. Assume that F, L commute, that is 
FoL=LoF. Prove the usual rules: 


(F + LY = F? + 2FoL 4 L?, 
(F — LY = F? —2FoL + L?, 
(F + L)o(F — D) = F? — L?. 


3. Prove the usual rule for a linear map F: V — V: 


— F)o(l - F E F)2I— F'*, 
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4. Let V be a vector space and T: V V a linear map such that T? = I. Define 
P=41+T) and Q = ¿(I — T). 
(a) Show that P? = P, and Q? = Q. 
(b) Show that P+ Q =I. 
(c) Show that Ker P = Im Q and Im P = Ker Q. 


5. Let P: V— V be a linear map such that P? = P. Define Q = I — P. 
(a) Show that Q? — Q. 
(b) Show that Im P = Ker Q and Ker P = Im Q. 
A linear map P such that P? — P is called a projection. It generalizes the 
notion of projection in the usual sense. 


6. Let P: V V be a projection, that is a linear map such that P? = P. 
(a) Show that V 2 Ker P + Im P. 
(b) Show that the intersection of Im P and Ker P is {O}. In other words, if 
velm P and ve Ker P, then v = O. 


Let V be a vector space, and let U, W be subspaces. One says that V is a 
direct sum of U and W if the following conditions are satisfied: 


V=U+W and UnW= {0}. 


In Exercise 6, you have proved that V is the direct sum of Ker P and Im P. 


7. Let V be the direct sum of subspaces U and W. Let ve V and suppose we 
have expressed v as a sum v — u +w with ueU and weW. Show that u, w 
are uniquely determined by v. That is, if v =u, + w, with u, €U and w,eW, 
then u=u, and w= w.. 


8. Let U, W be two vector spaces, and let V= U x W be the set of all pairs 
(u, w) with ue U and we W. Then V is a vector space, as described in Exercise 
15 of Chapter IV, $3. Let 

P:VoV 
be the map such that P(u, w) = (u, 0). Show that P is a projection. 


If you identify U with the set of all elements (u, 0) with ue U, and identify W 
with the set of all elements (0, w) with we W, then V is the direct sum of U and 
T For example, let n =r + s where r, s are positive integers. Then 

R” = R’ x R*. 
Note that R” is the set of all n-tuples of real numbers, which can be viewed as 


C EER E E 2) with Xi, yj€ R. 


Thus R" may be viewed as the direct sum of R' and R*. The projection of R" on 
the first r components is given by the map P such that 


PX iux quee xq so X200 30k 
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This map is a linear map, and P? — P. Sometimes one also calls the map such 
that 
POX isa) = (X352 5X,) 


a projection, but it maps R” into R', so we cannot again apply P to (x,,...,x,) 
since P is defined on all of R". 

Note that we could also take the projection on the second set of coordinates, 
that is, we let 


DX he qs.) SEX Sou: ipie): 


This is called the projection on the second factor of R”, viewed as R” x R*. 


9. Let A, B be two matrices so that the product AB is defined. Prove that 


rank AB € rank A and rank AB < rank B. 
[Hint: Consider the linear maps L,5, L4, and Lg.] 


In fact, define the rank of a linear map L to be dim Im L. If L: V ^ W and 
F: W — U are linear maps of finite dimensional spaces, show that 


rank Fo L < rank F and rank Fo L € rank L. 


10. Let A be an n x n matrix, and let C be a vector in R". Let T. be the 
translation by C as in Example 2. Write out in full the formulas for 


Lie TX) and Tc» LX), 
where X is in R”. Give an example of A and C such that 


L,» Te Æ Tee La. 


V, §2. Inverses 


Let 
F:V> W 


be a mapping (which in the applications is linear). We say that F has an 
inverse if there exists a mapping 


G:WV 
such that 
Go»F-Iy and FoG=Ty. 


By this we mean that the composite maps Go F and FoG are the iden- 
tity mappings of V and W, respectively. If F has an inverse, we also say 
that F is invertible. 
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Example 1. The inverse mapping of the translation T, is the transla- 
tion T. ,, because 


T.,»T(v) = T_,v + u) =v+u-—u=v. 


Thus 
ToT eJ. 


u u 


Similarly, T,» T_, = I. 


Example 2. Let A be a square n x n matrix, and let 
L,: R” > R” 


be the usual linear map such that L,(X) = AX. Assume that A has an 
inverse matrix A !, so that 44 ! 2A !A- I. Then the formula 


L,9Lg = Lap 
of the preceding sections shows that 
Lcd = Li = I. 


Hence L, has an inverse mapping, which is precisely multiplication 
by At. 


Theorem 2.1. Let F: U — V be a linear map, and assume that this map 
has an inverse mapping G: V — U. Then G is a linear map. 


Proof. Let v,, v; € V. We must first show that 
G(v, + v5) = G(v,) + G(v;). 
Let u, = G(v,) and u, = G(v;). By definition, this means that 
F(u,) = v, and F(u5) = v2. 
Since F is linear, we find that 
F(u, + uy) = F(u) + F(u) = v, + v2. 
By definition of the inverse map, this means that G(v, + v5) =u, + u5, 


thus proving what we wanted. We leave the proof that G(cv) = cG(v) as 
an exercise (Exercise 2). 
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Example 3. Let L:V— V be a linear map such that L? = O. Then 
I + L is invertible, because 


(I- Lu —L)- I — L^ =I, 
and similarly on the other side, (J — LXI + L) 2 I. Thus we have 
+L) '=I1-L. 

We shall now express inverse mappings in somewhat different termin- 
ology. 

Let 

F:V>W 

be a mapping. We say that F is injective (in older terminology, one-to- 


one) if given elements v,, v, in V such that v, z v, then F(v,) 4 F(v,). 
We are mostly interested in this notion for linear mappings. 


Example 4. Suppose that F is a linear map whose kernel is not {O}. 
Then there is an element v Æ O in the kernel, and we have 


F(0) = F(v) = O. 


Hence F is not injective. We shall now prove the converse. 


Theorem 2.2. A linear map F:V—W is injective if and only if its 
kernel is {0}. 


Proof. We have already proved one implication. Conversely, assume 
that the kernel is (O). We must prove that F is injective. Let v, # v, be 
distinct elements of V. We must show that F(v,) 4 F(v;). But 


F(v,) — F(v;) = F(v, — v2) because F is linear. 


Since the kernel of F is {O}, and v,—v,#0, it follows that 
F(v, — v2) Z O. Hence F(v,) — F(v;) Æ O, so F(v,)  F(v;). This proves 
the theorem. 


Let F:V— W be a mapping. If the image of F is all of W then we 
say that F is surjective. The two notions for a mapping to be injective 
or surjective combine to give a basic criterion for F to have an inverse. 


Theorem 2.3. A mapping F:V—W has an inverse if and only if it is 
both injective and surjective. 
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Proof. Suppose F is both injective and surjective. Given an element w 
in W, there exists an element v in V such that F(v) = w (because F is 
surjective). There is only one such element v (because F is injective). 
Thus we may define 


G(w) = unique element v such that F(v) = w. 
By the way we have defined G, it is then clear that 
G(F(v)) = v and F(G(w)) = w. 


Thus G is the inverse mapping of F. 
Conversely, suppose F has an inverse mapping G. Let v,, v, be 
elements of V such that F(v,) = F(v;). Applying G yields 


vı = Go F(v,) = Go F(v2) = vz, 
so F is injective. Secondly, let w be an element of W. The equation 
w = F(G(w)) 


shows that w = F(v) for some v, namely v = G(w), so F is surjective. 
This proves the theorem. 


In the case of linear maps, we have certain tests for injectivity, or 
surjectivity, which allow us to verify fewer conditions when we wish to 
prove that a linear map is invertible. 


Theorem 2.4. Let F: V —^ W be a linear map. Assume that 
dim V = dim W. 


(i) If Ker F = {O} then F is invertible. 
(ii) If F is surjective, then F is invertible. 


Proof. Suppose first that Ker F = {O}. Then F is injective by Theorem 
2.2. But 
dim V = dim Ker F + dim Im F, 


so dim V = dim Im F, and the image of F is a subspace of W having the 
same dimension as W. Hence Im F = W by Theorem 5.6 of Chapter III. 
Hence F is surjective. This proves (i) by using Theorem 2.3. 

The proof of (ii) will be left as an exercise. 


Example 5. Let F: R? — R? be the linear map such that 


F(x, y) = (3x — y, 4x + 2y). 
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We wish to show that F has an inverse. First note that the kernel of F 
is {O}, because if 
3x— y=), 


4x + 2y — 0, 
then we can solve for x, y in the usual way: Multiply the first equation 
by 2 and add it to the second. We find 10x = 0, whence x = 0, and then 


y = 0 because y = 3x. Hence F is injective, because its kernel is {0}. 
Hence F is invertible by Theorem 2.4(1). 


A linear map F: U > V which has an inverse G: V — U (we also say 
invertible) is called an isomorphism. 


Example 6. Let V be a vector space of dimension n. Let 


NIORT 


be a basis for V. Let 
L:R'" >V 
be the map such that 
L(x,,...,X,) = XV, +--+ + XU,. 
Then L is an isomorphism. 
Proof. The kernel of L is {0}, because if 
XQU4 T c: xQU, = O, 
then all x; = 0 (since v,,...,v, are linearly independent). The image of L 


is all of V, because v,,... ,v, generate V. By Theorem 24, it follows that 
L is an isomorphism. 


Theorem 2.5. A square matrix A is invertible if and only if its columns 
Al,...,A" are linearly independent. 


Proof. By Theorem 24, the linear map L, is invertible if and only if 
Ker L, = {0}. But Ker L, consists of those n-tuples X such that 


xA! +- + xA" — O, 


in other words, those X giving a relation of linear dependence among 
the columns. Hence the theorem follows. 
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Exercises V, 82 


l. 


10. 


11. 


12. 


Let R, be rotation counterclockwise by an angle 0. How would you express 
the inverse Rg * as R, for some ọ in a simple way? If 


cos@ —gsin 0 
«(t um) 


sin 0 cos 0 


is the matrix associated with Rọ, what is the matrix associated with Rẹ +? 


. (a) Finish the proof of Theorem 2.1. 


(b) Give the proof of Theorem 2.4(ii). 


. Let F, G be invertible linear maps of a vector space V onto itself. Show that 


(FoG) t = Gto F>}. 


. Let L: R? 2 R? be the linear map defined by 


L(x, y) = (x + y, x — y). 


Show that L is invertible. 


. Let L: R? 2 R? be the linear map defined by 


L(x, y) = Qx + y, 3x — 5y). 


Show that L is invertible. 


. Let L: R? 2 R? be the linear maps as indicated. Show that L is invertible in 


each case. 
(a) L(x, y, z) = (x - y, x c z, x + y + 3z) 
(b) L(x, y,z)=(2x-—y+2zx+y, 3x + y + 2). 


. Let L: VV be a linear mapping such that L? = O. Show that I— L is 


invertible. (J is the identity mapping on V.) 


. Let L: V— V be a linear map such that L? + 2L +1=0. Show that L is 


invertible. 


. Let L:V—>V be a linear map such that L? =O. Show that I— L is 


invertible. 


Let L:V—V be a linear map such that L” = 0. Show that I— L is 
invertible. 


Let V be a two-dimensional vector space, and let L: V — V be a linear map 
such that L? = O, but L #0. Let v be an element of V such that L(v) £ O. 
Let w = L(v). Prove that (v, w} is a basis of V. 


Let V be the set of all infinite sequences of real numbers 


(X1, X355 X499). 
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This could be called infinite dimensional space. Addition and multiplication 
by numbers are defined componentwise, so V is a vector space. Define the 
map F: V > V by 


F(X4, X5, X4,...) = (0, X44 X5; X3,...) 


For obvious reasons, F is called the shift operator, and F is linear. 
(a) Is F injective? What is the kernel of F? 

(b) Is F surjective? 

(c) Show that there is a linear map G: V > V such that Go F = I. 
(d) Does the map G of (c) have the property that FoG = I? 


13. Let V be a vector space, and let U, W be two subspaces. Assume that V is 
the direct sum of U and W, that is 


V=U+W and UnWe= {O}. 
Let L:U x W—V be the mapping such that 
L(u, w) = u + w. 


Show that L is a bijective linear map. (So prove that L is linear, L is 
injective, L is surjective.) 


CHAPTER VI 


Scalar Products and 
Orthogonality 


VI, S1. Scalar Products 


Let V be a vector space. À scalar product on V is an association which 
to any pair of elements (v, w) of V associates a number, denoted by 
(v, w>, satisfying the following properties: 


SP 1. We have (v, wò = <w, vò for all v, w in V. 
SP 2. If u, v, w are elements of V, then 
<u, v + wò = <u, v) + Cu, w). 
SP 3. If x is a number, then 
(Xu, vò = x(u, v» = CU, XU). 
We shall also assume that the scalar product satisfies the condition: 
SP 4. For all v in V we have <v, vò 2 0, and <v, vò > 0 if v # O. 
A scalar product satisfying this condition is called positive definite. 


For the rest of this section we assume that V is a vector space with a 
positive definite scalar product. 
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Example 1. Let V — R", and define 
(X,Y»-X.Y 


for elements X, Y of R". Then this is a positive definite scalar product. 


Example 2. Let V be the space of continuous real-valued functions on 
the interval [—7, x]. If f, g are in V, we define 


(f p= | f(t)g(t) dt. 


Simple properties of the integral show that this is a scalar product, 
which is in fact positive definite. 


In calculus, we study the second example, which gives rise to the 
theory of Fourier series. Here we discuss only general properties of 
scalar products and applications to euclidean spaces. The notation < , > 
is used because in dealing with vector spaces of functions, a dot f-g 
may be confused with the ordinary product of functions. 

As in the case of the dot product, we define elements v, w of V to be 
orthogonal, or perpendicular, and write v lw, if <v,w>=0. If S is a 
subset of V, we denote by S+ the set of all elements w in V which are 
perpendicular to all elements of S, i.e. such that <w, v» = 0 for all v in S. 
Then using SP 1, SP 2, and SP 3, one verifies at once that S+ is a sub- 
space of V, called the orthogonal space of S. If w is perpendicular to S, 
we also write w LS. Let U be the subspace of V generated by the 
elements of S. If w is perpendicular to S, and if v,, v, are in S, then 


<w, 04, + 02> = CW, vi) + <w, v5? = 0. 


If c is a number, then 


<w, cv) = cw, t» = 0. 


Hence w is perpendicular to linear combinations of elements of S, and 
hence w is perpendicular to U. 


Example 3. Let (a;;) be an m x n matrix, and let A,,...,A,, be its row 
vectors. Let X = (x,,...,x,) as usual. The system of homogeneous linear 
equations 

d141X4, +++ + 4,,x, = 0 
(+4) EE ME 


AmiX1 Toc aux, —0 
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can also be written in abbreviated form using the dot product, as 
X — 0. 


The set of solutions X of this homogeneous system is therefore the set of 
all vectors perpendicular to A,,...,A,,. It is therefore the subspace of R" 
which is the orthogonal subspace to the space generated by A,,...,A,. 
If U is the space of solutions, and if W denotes the space generated by 
A,,..., A4, we have 

U = W+. 


We call dim U the dimension of the space of solutions of the system of 
linear equations. 


As in Chapter I, we define the length, or norm of an element ve V by 


loll = y <v, vò. 
If c is any number, then we immediately get 
lcvl| = [e] lloll, 


because 


cvl] = y <cv, cv) = y c^ <v, v? = |e] loll. 


Thus we see the same type of arguments as in Chapter I apply here. In 
fact, any argument given in Chapter I which does not use coordinates 
applies to our more general situation. We shall see further examples as 
we go along. 

As before, we say that an element ve V is a unit vector if ||v|| = 1. If 
veV and v Æ O, then v/||v|| is a unit vector. 

The following two identities follow directly from the definition of the 
length. 


The Pythagoras theorem. If v, w are perpendicular, then 
|» + wl? = loll? + Iw’. 
The parallelogram law. For any v, w we have 


lo + wil? + llv — wl? = 2Ivl? + 2IwI*. 
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The proofs are trivial. We give the first, and leave the second as an 
exercise. For the first, we have 


lvo + wl? = <v + w, v + w) = <v, v» + 2t, wò + <w, wY 
= Jv? + wil". 

Let w be an element of V such that ||wl| 40. For any v there exists a 
unique number c such that v — cw is perpendicular to w. Indeed, for 
v — cw to be perpendicular to w we must have 

(v — cw, wò = 0, 


whence (v, wò — (cw, wò = 0 and (v, wò = c(w, w). Thus 


_ w) 
c = (w, wy 


Conversely, letting c have this value shows that v — cw is perpendicular 
to w. We call c the component of v along w. 
In particular, if w is a unit vector, then the component of v along w is 
simply 
c= (t, w». 


Example 4. Let V= R” with the usual scalar product, ie. the dot 
product. If E; is the i-th unit vector, and X = (x,,...,x,) then the com- 
ponent of X along E; is simply 


X e E; = Xi, 
that is, the i-th component of X. 


Example 5. Let V be the space of continuous functions on [—z, z]. 
Let f be the function given by f(x) = sin kx, where k is some integer > 0. 


Then 
PERGE a 


ie 


If g is any continuous function on [—7,z], then the component of g 
along f is also called the Fourier coefficient of g with respect to f, and is 
equal to 


1/2 
sin? kx ix) 


TIE | g(x) sin kx dx. 
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As with the case of n-space, we define the projection of v along w to 
be the vector cw, because of our usual picture: 


Figure 1 


Exactly the same arguments which we gave in Chapter I can now be 
used to get the Schwarz inequality, namely: 


Theorem 1.1 For all v, weV we have 
IC, w»| S lvl wil. 

Proof. If w =O, then both sides are equal to 0 and our inequality is 
obvious. Next, assume that w Æ O. Let c be the component of v along 
w. We write 

= V — CW + CW. 


Then v — cw is perpendicular to cw, so by Pythagoras, 


lol? = lv — ewl? + llew? 

= |v — cwll? + |c]? [wl]. 

Therefore |c|* |w||? € ||v||? and taking square roots yields 
ic] [wl S lol. 

But c = <v, w»/|w||?. Then one factor ||w|| cancels, and cross multiplying 
by ||w|| yields 

IC, w>| S lvl] wi, 
thereby proving the theorem. 

Theorem 1.2 If v, weV, then 
lo + wl S vl + lw]. 


Proof. Exactly the same as that of the analogous theorem in Chapter 
I, §4. 
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Let v,,...,v, be non-zero elements of V which are mutually perpendi- 
cular, that is <v;, vj? — O if i #j. Let c; be the component of v along v;. 
Then 

V — CU ,—-:— CU 


n n 


is perpendicular to v,,...,v,. To see this, all we have to do is to take the 
product with v; for any j. All the terms involving <v;,v;> will give 0 if 
i #j, and we shall have two remaining terms 


j 


(t, vj? — C;Vj, vj? 


which cancel Thus subtracting linear combinations as above ortho- 
gonalizes v with respect to v,,...,v,. The next theorem shows that 
CU, +---+c,v, gives the closest approximation to v as a linear com- 
bination of v,,...,v 


n° 


Theorem 1.3 Let v,,...,v, be vectors which are mutually perpendicular, 
and such that ||v,|| 40 for all i. Let v be an element of V, and let c; be 
the component of v along v;. Let a,,...,a, be numbers. Then 


= 


v— > aU. 


n 
v — Y. CK Uy 
k=1 


Proof. We know that 
v— Y C4U, 
k=1 


is perpendicular to each v;, i= 1,...,n. Hence it is perpendicular to any 
linear combination of v,,...,v,. Now we have: 


|v — >, a,v,| = |v — »3 Cy, + »» (cy — avil? 
= |v — 5 crtali? + | b» (Gi awil? 


by the Pythagoras theorem. This proves that 


|v — 2. CV, |7 £ jes 5 AV, |", 


and thus our theorem is proved. 


Example 6. Consider the vector space V of all continuous functions 
on the interval [0, 2x]. Let 


g(x) = cos kx, for k=0, 1, 2,.... 
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We use the scalar product 
2n 
f. g= | SOI dx 
0 
Then it is easily verified that 


lgoll = ./22 and gl = Jn for k>O. 


The Fourier coefficient of f with respect to g, is 


1 2n 
e | f(x) cos kx, dx, for k>O. 
0 
If we take v, =g, for k=1,...,n then Theorem 1.3 tells us that the 
linear combination 


Co +c, COS X + c4 COS 2x t + c, COS Nx 


gives the best approximation to the function f among all possible linear 
combinations 
Ag + d, COS X t -::- c a, COS nx 


with arbitrary real numbers ag, a,,...,a,. Such a sum is called a partial 
sum of the Fourier series. 

Similarly, we could take linear combinations of the functions sin kx. 
This leads into the theory of Fourier series. We do not go into this 
deeper here. We merely wanted to point out the analogy, and the useful- 
ness of the geometric language and formalism in dealing with these 
objects. 


The next theorem is known as the Bessel inequality. 


Theorem 1.4. If v,,...,v, are mutually perpendicular unit vectors, and if 
c; is the Fourier coefficient of v with respect to v;, then 


n 
de S lol’. 


= 
Il 
[o 


Proof. We have 
0x (v— X civ, v — ¥ cu» 
= (v, v) — 9 2c;<v, v» + Yc? 
= (»v»—)» c. 


From this our inequality follows. 
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Exercises VI, $1 


1. Let V be a vector space with a positive definite scalar product. Let v,,...,v, 
be non-zero elements of V which are mutually perpendicular, meaning that 
(tj, 0j? = 0 if i z j. Show that v,, ...,v, are linearly independent. 


The following exercise gives an important example of a scalar product. 


2. Let A be a symmetric n x n matrix. Given two column vectors X, Y eR", 
define 
(X, Y» -'X AY. 


(a) Show that this symbol satisfies the first three properties of a scalar 
product. 

(b) Give an example of a 1 x 1 matrix and a non-zero 2 x 2 matrix such 
that the fourth property is not satisfied. If this fourth property is sat- 
isfied, that is 'X AX > 0 for all X z O, then the matrix A is called positive 
definite. 

(c) Give an example of a 2 x 2 matrix which is symmetric and positive 


definite. 
d a b 
«4b daf 


(d) Let a > 0, and let 
Prove that A is positive definite if and only if ad — b? > 0. [Hint: Let 
X = '(x, y) and complete the square in the expression 'X AX.] 

(e) If a « 0 show that A is not positive definite. 


3. Determine whether the following matrices are positive definite. 
3 —1 b —2 1 
()| 4/5 (i, 5 
4 1 4 4 
et 2) @( 1) 
4 1 f 4 -1 
©) {1 jo O (1 qo 
The trace of a matrix 


4. Let A bean nx n matrix. Define the trace of A to be the sum of the 
diagonal elements. Thus if A = (a,;), then 


tr(A) — y dii. 
i=1 


For instance, if 
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then tr(A)=1+4=5. If 


1 —1 5 
A= ]{2 1 3 |, 
1 —4 7 


then tr(A) = 9. Compute the trace of the following matrices: 


1 7 3 di d <p | 4 
(ay){-1 s 2 [|l 1 4 1 ol 3 4 4] 
2" 44 ^ ud a 25° 2 6 


CA 


. (a) For any square matrix A show that tr(A) = tr(‘A). 
(b) Show that the trace is a linear map. 


6. If A is a symmetric square matrix, show that tr(AA) 2 0, and =0 if and only if 
A — O. 
7. Let A, B be the indicated matrices. Show that 
tr(AB) = tr(BA). 
1 —1 1 3 1 2 
(a) A -|2 4 1 B= 1 1 0 
3 0 1 —1 2 1 
1 7 3 3 —2 4 
(b) A= [-1 5 2 B= 1 4 1}. 
2 3 —4 —7 —3 2 


8. (a) Prove in general that if A, B are square n x n matrices, then 
tr(AB) = tr(BA). 


(b) If C is an n x n matrix which has an inverse, then tr(C AC) = tr(A). 


9. Let V be the vector space of symmetric n x n matrices. For A, BeV define 
the symbol 


<A, B» = tr(AB), 


where tr is the trace (sum of the diagonal elements). Show that the previous 


properties in particular imply that this defines a positive definite scalar 
product on V. 


Exercises 10 through 13 deal with the scalar product in the context of 
calculus. 


10. Let V be the space of continuous functions on [0,27], and let the scalar 
product be given by the integral over this interval as in the text, that is 


2n 
f DO = Ji f o)g(x) dx. 
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Let g,(x) = cos nx for n 2 0 and h,(x) = sin mx for m2 1. 


(a) Show that |lgoll = \/2z, llg,ll = hall = 4/7 for nz 1. 
(b) Show that g,1g,, if m z n and that g, 1h, for all m, n. Hint: Use 
formulas like 
sin A cos B = 3[sin(A + B) + sin(A — B)] 


cos A cos B = 4[cos(A + B) + cos(A — B)]. 
11. Let f(x) 2 x on the interval [0,2z]. Find <f,g,> and <f,h,> for the func- 


tions g,, h, of Exercise 10. Find the Fourier coefficients of f with respect to 
g, and h,. 


12. Same question as in Exercise 11 if f(x) = x?. (Exercises 10 through 13 give you 
a review of some elementary integrals from calculus.) 


13. (a) Let f(x) = x on the interval [0,22]. Find || f ||. 
(b) Let f(x) = x? on the same interval. Find || f |l. 


VI, 82. Orthogonal Bases 


Let V be a vector space with a positive definite scalar product through- 
out this section. A basis {v,,...,v,} of V is said to be orthogonal if its 
elements are mutually perpendicular, ie. if (vj, vj? = 0 whenever i+ j. If 
in addition each element of the basis has norm 1, then the basis is called 
orthonormal. 


Example 1. The standard unit vectors 


E,,..., E, ans R” 


n 


form an orthonormal basis of R”. Indeed, each has norm 1, and they are 
mutually orthogonal, that is 


E; ° E; m 0 if 1 Æj. 
Of course there are many other orthonormal bases of R”. 
This example is typical in the following sense. 
Let {e,,...,e,} be an orthonormal basis of V. Any vector ve V can be 
written in terms of coordinates 
U = Xe, A eer ee with x;€ R. 


Let w be another element of V, and write 


W = Yea T: y, with y,ER. 
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Then 
<v, wò = (X40, queste Xnens Y1€1 do nee Vnln> 


n 
> (X;e;, yj€j 
NL 


1 


l 
n 

= £ Xi yi 
i=1 


because <e,,e;> =0 for i#j. Hence if X is the coordinate n-tuple of v 
and Y the coordinate n-tuple of w, then 


<v wo = X-Y 


so the scalar product is given precisely as the dot product of the 
coordinates. This is one of the uses of orthonormal bases: to identify 
the scalar product with the old-fashioned dot product. 


Example 2. Consider R?. Let 
A=(1,1) and  B-(1, —1) 


Then A-B = 0, so A is orthogonal to B, and A, B are linearly indepen- 
dent. Therefore they form a basis of R?, and in fact they form an ortho- 
gonal basis of R^. To get an orthonormal basis from them, we divide 
each by its norm, so an orthonormal basis is given by 


[s zn and (7 a 
48 Và Ae 

In general, suppose we have a subspace W of R”, and let A,,...,A, be 
any basis of W. We want to get an orthogonal basis of W We follow a 
stepwise orthogonalization process. We start with A, = B,. Then we 
take A, and subtract its projection on A, to get a vector B,. Then we 
take A, and subtract its projections on B, and B, to get a vector B,. 
Then we take A, and subtract its projections on B,, B,, B, to get a 
vector B,. We continue in this way. This will eventually lead to an 
orthogonal basis of W. 

We state this as a theorem and prove it in the context of vector 
spaces with a scalar product. 


Theorem 2.1. Let V be a finite dimensional vector space, with a positive 
definite scalar product. Let W be a subspace of V, and let (w,,...,w,] 
be an orthogonal basis of W. If W +Æ V, then there exist elements 
Wm+1s---Wn Of V such that {w,,...,w,} is an orthogonal basis of V. 


182 SCALAR PRODUCTS AND ORTHOGONALITY [ VL $2] 


Proof. The method of proof is as important as the theorem, and is 
called the Gram-Schmidt orthogonalization process. We know from 
Chapter III, 83 that we can find elements v, ,,...,v, of V such that 


assess Was Unei DL 


is a basis of V. Of course, it is not an orthogonal basis. Let W,,,, be 
the space generated by w,,...,w,, v,4,,. We shall first obtain an ortho- 
gonal basis of W,,,,. The idea is to take v,,,, and subtract from it its 


projection along w,,...,w,. Thus we let 
m (Vm + 1? Wi? - (Unt 1? Wm? 
=: ANDR. Ce 
(wi, Wi? (Wm; Wm? 
Let 
Wm+1 = Umt1 — C1YW1 7 777 CmWm: 


Then w,,, is perpendicular to w,,...,w,. Furthermore, w,,, 4 O 
(otherwise v,,,, would be linearly dependent on w,,...,w,,), and v,,,, lies 
in the space generated by w,,...,w,,,, because 


Um+1 = Wmi + C4W4 7E 00 + Ca Wg. 


Hence (w,,...,w4,4,j is an orthogonal basis of W,,,,. We can now pro- 
ceed by induction, showing that the space W,,,, generated by 


Wis... Wm: Um 1»: oUm rs 
has orthogonal basis 
Doo Ma Eq d gt 


with s = 1,...,n — m. This concludes the proof. 


Corollary 2.2. Let V be a finite dimensional vector space with a positive 
definite scalar product. Assume that V z (Oj. Then V has an ortho- 
gonal basis. 


Proof. By hypothesis, there exists an element v, of V such that 
v, #O. We let W be the subspace generated by v,, and apply the 
theorem to get the desired basis. 
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We summarize the procedure of Theorem 2.1 once more. Suppose we 
are given an arbitrary basis {v,,...,v,} of V. We wish to orthogonalize 
it. We proceed as follows. We let 


vi = Ui, 
v. —qu Q05, 012 UMS 
Vit io D 
(vis V 
; (Us, V2) (U3, v? , 


V3 = V3 — 1 
(t5, D QUU» 


ENCORE" 
= - - peee 
" EE coc EC V»! 
Then (vj,...,7,j is an orthogonal basis. 

Given an orthogonal basis, we can always obtain an orthonormal 
basis by dividing each vector by its norm. 


Example 3. Find an orthonormal basis for the vector space generated 
by the vectors (1, 1,0, 1), (1, —2, 0,0), and (1, 0, — 1, 2). 
Let us denote these vectors by A, B, C. Let 


In other words, we subtract from B its projection along A. Then B’ is 
perpendicular to A. We find 


= 3(4, —5, 0, 1). 
Now we subtract from C its projection along A and B’, and thus we let 


C-A C-B 
C=C —— A- B'. 
A-A B'.B 


Since A and B' are perpendicular, taking the scalar product of C' with A 
and B' shows that C' 1s perpendicular to both A and B'. We find 


C' 2 Y(—4, —2, —7, 6). 


The vectors A, B', C' are non-zero and mutually perpendicular. They lie 
in the space generated by A, B, C. Hence they constitute an orthogonal 
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basis for that space. If we wish an orthonormal basis, then we divide 
these vectors by their norm, and thus obtain 


A 1 
de. L. 
IA T J3 
B’ 1 
= 4, —5, 0, 1 , 
|.B'] us 
C' 1 


(—4, —2, — 7, 6), 


ICI ./[105 


as an orthonormal basis. 


Example 4. Find an orthogonal basis for the space of solutions of the 
linear equation 


3x —2y+z=0. 


First we find a basis, not necessarily orthogonal. For instance, we 
give z an arbitrary value, say z= 1. Thus we have to satisfy 


3x — 2y = — 1. 
By inspection, we let x = 1, y 220r x = 3, y = 5, that is 
AOL. A) and B = (3, 5, 1). 
Then it is easily verified that A, B are linearly independent. By Theorem 
4.3 of Chapter 4, the space of solutions has dimension 2, so A, B form a 
basis of that space of solutions. To get an orthogonal basis, we start 


with A. Then we let 


C = B — projection of B along A 


Then {A, C} is an orthogonal basis of the space of solutions. It is some- 
times convenient to get rid of the denominator. We may use 


A=(1,2,1) and D= (2, 1, —4) 
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equally well for an orthogonal basis of that space. As a check, substitute 
back in the original equation to see that these vectors give solutions, and 
also verify that A- D = 0, so that they are perpendicular to each other. 


Example 5. Find an orthogonal basis for the space of solutions of the 
homogeneous equations 


3x —2y+z+ w=), 
x+ y + 2w = 0. 


Let W be the space of solutions in R*. Then W is the space orthog- 
onal to the two vectors 


(3, —2,1,1) and (1, 1,0, 2). 


These are obviously linearly independent (by any number of arguments, 
you can prove at once that the matrix 


has rank 2, for instance). Hence 
dim W = 4—2 =2. 


Next we find a basis for the space of solutions. Let us put w = 1, and 
solve 


3x — 2y +z = —1, 
x+ y = —2, 


by ordinary elimination. If we put y = 0, then we get a solution with 
x — —2, and 


z= —1—3x+2y=5. 

If we put y = 1, then we get a solution with x = —3, and 
z= —1—3x+2y= 10. 

Thus we get the two solutions 


A-(—2,0,5,1) and B=(-3, 1, 10, 1). 
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(As a check, substitute back in the original system of equations to see 
that no computational error has been made.) These two solutions are 
linearly independent, because for instance the matrix 


—2 0 

—3 1 
has rank 2. Hence {A, B} is a basis for the space of solutions. To find 
an orthogonal basis, we orthogonalize B, to get 


We can also clear denominators, and let C = 10B’, so 


C = (—30, 10, 100, 10) — (— 38, 0, 95, 19) 
= (8, 10, 9: —9) 
Then (A, C] is an orthogonal basis for the space of solutions. (Again, 
check by substituting back in the system of equations, and also check 


perpendicularity by seeing directly that A-C — 0.) 


One can also find an orthogonal basis without guessing solutions by 
inspection or elimination at the start, as follows. 


Example 6. Find a basis for the space of solutions of the equation 
3x —2y+z=0. 
The space of solutions is the space orthogonal to the vector (3, —2, 1) 
and hence has dimension 2. There are of course many bases for this 
space. To find one, we first extend (3, —2,1)= A to a basis of R^. We 


do this by selecting vectors B, C such that A, B, C are linearly indepen- 
dent. For instance, take 


B = (0, 1, 0) 


and 
C = (0, 0, 1). 


Then A, B, C are linearly independent. To see this, we proceed as usual. 
If a, b, c are numbers such that 


aA+bB+cC =), 
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then 
3a ex. 
—2a+b=0, 
a+c=0. 


This is easily solved to see that a = b = c = 0, so A, B, C are linearly 
independent. Now we must orthogonalize these vectors. 


Let 
(B, AY 351 
B = B — A — Sa a SD 
B 
(C, A) (C, B^) 
(A, A> CB’, B^» 
= (0, 0, 1) — 743, —2, 1) — 35(3, 5, 1). 


/ 


CSC 


Then {B’,C’} is a basis for the space of solutions of the given equa- 
tion. As you see, this procedure is slightly longer than the one used by 
guessing first, and involves two orthogonalizations rather than one as in 
Example 4. 

In Theorem 2.1 we obtained an orthogonal basis for V by starting 


with an orthogonal basis for a subspace. Let us now look at the situa- 
tion more symmetrically. 


Theorem 2.3. Let V be a vector space of dimension n, with a positive 
definite scalar product. Let {w,,...,W,, u,,... us} be an orthogonal basis 
for V. Let W be the subspace generated by w,,...,w, and let U be 
the subspace generated by u,,...,u,. Then U = W+, or by symmetry, 
W = U+. Hence for any subspace W of V we have the relation 


dim W + dim W+ = dim V. 
Proof. We shall prove that W+ c U and Uc W+, so W+ = U. 


First let veW*. There exist numbers a; (i=1,...,r) and b; 
(j= 1,...,s) such that 


Since v is perpendicular to all elements of W, we have for any 
ele 


0 = v:w, = Y QiWi w, + >, bju;- Wy 


= ay Wi, ° Wi 
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because w;-w, — 0 if iz k and uj;-w, — O0 for all j. Since w,-w, 40 it 
follows that a, — 0 for all k= 1,...,r so v is a linear combination of 
u;,...,4, and veU. Thus W* c U. 

Conversely, let ve U, so v is a linear combination of u,,...,u,. Since 
[wi ...,w,, u4,...,44) is an orthogonal basis of V it follows that each u; is 
perpendicular to W so v itself is perpendicular to W, so U c W+. There- 
fore we have proved that U = W+. 

Finally, Theorem 2.1 shows that the previous situation applies to any 
subspace W of V, and by the definition of dimension, 


dim V =r + s = dim W + dim W+, 
thus concluding the proof of the theorem. 


Example 7. Consider R?. Let A, B be two linearly independent vec- 
tors in R?. Then the space of vectors which are perpendicular to both 
A and B is a 1-dimensional space. If {N} is a basis for this space, any 
other basis for this space is of type {tN}, where t is a number z 0. 

Again in R?, let N be a non-zero vector. The space of vectors perpen- 
dicular to N is a 2-dimensional space, i.e. a plane, passing through the 
origin O. 


Remark. Theorem 2.3 gives a new proof of the fact that the row rank 
of a matrix is equal to its column rank. Indeed, let A = (aj) be an 
m x n matrix. Let S be the space of solutions of the equation AX = O, 
so $ = Ker L,. By Theorem 3.2 of Chapter IV, we have 


dim S + column rank = n, 


because the image of L, is the space generated by the columns of A. 

On the other hand, S is the space of vectors in R" perpendicular to 
the rows of A, so if W is the row space then S = W+. Therefore by 
Theorem 2.3 we get 


dim S + row rank = n. 
This proves that row rank = column rank. In some ways, this is a more 
satisfying and conceptual proof of the relation than with the row and 
column operations that we used before. 
We conclude this section by pointing out some useful notation. Let 
X, YeR", and view X, Y as column vectors. Let < , » denote the 


standard scalar product on R". Thus by definition 


(X, Y» 2 'XY. 
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Similarly, let A be an n x n matrix. Then 
(X, AY» ='XAY = ('AX)Y = (AX, Y». 


Thus we obtain the formula 


(X, AY) = C AX, Y). 


The transpose of the matrix A corresponds to transposing A to 'A from 
one side of the scalar product to the other. This notation is frequently 
used in applications, which is one of the reasons for mentioning it here. 


Exercises VI, 82 


1. Find orthonormal bases for the subspaces of R? generated by the following 
vectors: 
(a) (1, 1, —1) and (1, 0, 1), 
(b) (2, 1, 1) and (1, 3, — 1). 


2. Find an orthonormal basis for the subspace of R^ generated by the vectors 
(1, 2, 1, 0) and (1, 2, 3, 1). 


3. Find an orthonormal basis for the subspace of R^ generated by (1, 1, 0, 0), 
(1, —1, 1, 1), and (— 1, 0, 2, 1). 


4. Find an orthogonal basis for the space of solutions of the following equa- 


tions. 
(a) 2x -y —z 20 (b x —y+z=0 
y+z=0 
(c) 4x+ 7y ^ nz 20 (d) xx+y+z=0 
ax — y+ z=0 X —y = 0 


y+z=0 


In the next exercises, we consider the vector space of continuous functions on 
the interval [0,1]. We define the scalar product of two such functions f, g by 
the rule 


1 
d g) = | fale) d. 
O 
5. Let V be the subspace of functions generated by the two functions f(t) =t 
and g(t) = t^. Find an orthonormal basis for V. 


6. Let V be the subspace generated by the three functions 1, t, t^ (where 1 is 
the constant function). Find an orthonormal basis for V. 


7. Let V be a finite dimensional vector space with a positive definite scalar 
product. Let W be a subspace. Show that 


V=W+Wt and WoW? = {0}. 
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In the terminology of the preceding chapter, this means that V is the direct 
sum of W and its orthogonal complement. [Use Theorem 2.3.] 


8. In Exercise 7, show that (W+)+ = W. Why is this immediate from Theorem 
2.3? 


9. (a) Let V be the space of symmetric n x n matrices. For A, Be V define 
(A, B» = tr(AB), 


where tr is the trace (sum of diagonal elements). Show that this satisfies 
all the properties of a positive definite scalar product. (You might al- 
ready have done this as an exercise in a previous section.) 

(b) Let W be the subspace of matrices A such that tr(A) — O0. What is the 
dimension of the orthogonal complement of W, relative to the scalar 
product in part (a)? Give an explicit basis for this orthogonal comple- 
ment. 


10. Let A be a symmetric n x n matrix. Let X, YER" be eigenvectors for A, that 
is suppose that there exist numbers a, b such that AX — aX and AY = bY. 
Assume that a # b. Prove that X, Y are perpendicular. 


VI, $3. Bilinear Maps and Matrices 
Let U, V, W be vector spaces, and let 
g:Ux V>W 
be a map. We say that g is bilinear if for each fixed u e U the map 
v g(u, v) 
is linear, and for each fixed v € V, the map 
ur g(u, v) 
is linear. The first condition written out reads 


g(u, v, + v5) = g(u, vı) + glu, v2), 
g(u, cv) = cg(u, v), 


and similarly for the second condition on the other side. 


Example. Let A be an m x n matrix, A — (aj). We can define a map 


ga: R” x R'2R 
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by letting 
g4(X, Y) ='XAY, 


which written out looks like this: 


Qi, ^c GAin\ /Y1 


Coe 9X5) 


Ami `° Amh Yn 


Our vectors X and Y are supposed to be column vectors, so that 'X is a 
row vector, as shown. Then 'XA is a row vector, and 'XAY isa 1 x 1 
matrix, i.e. a number. Thus g, maps pairs of vectors into the reals. Such 
a map g, satisfies properties similar to those of a scalar product. If we 
fix X, then the map Y'X AY is linear, and if we fix Y, then the map 
X —'X AY is also linear. In other words, say fixing X, we have 


g4( X. Y b Y!) = gA(X, Y) d: g4(X, Y’), 


g4(X. cY) = cg4(X, Y), 


and similarly on the other side. This is merely a reformulation of prop- 
erties of multiplication of matrices, namely 


X ACY + Y) 2'XAY 4+ 'XAY,, 


X A(cY) = t XAY. 


It is convenient to write out the multiplication 'X AY as a sum. Note 
that 


j-th component of 'XA = Y, xia 
i=1 


ij? 


and thus 


‘XAY= ) 2, xy; Dd, agXiyj- 


j=1 i=1 j=1 i=1 


Example. Let 


If X = (C) and Y = is then 
X2 y2 


'XAY = xyy; + 2x3y5 + 3X2Y1 — X22. 
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Theorem 3.1. Given a bilinear map g: R" x R" R, there exists a 
unique matrix A such that g = g,4, i.e. such that 


g(X, Y) ='X AY. 


Proof. The statement of Theorem 3.1 is similar to the statement repre- 
senting linear maps by matrices, and its proof is an extension of previous 
proofs. Remember that we used the standard bases for R" to prove these 
previous results, and we used coordinates. We do the same here. Let 
E!,...,E" be the standard unit vectors for R", and let U!,...,U" be the 
standard unit vectors for R". We can then write any X e R" as 


X = 3 x; E! 
i=1 
and any YeR" as 
Y= Y y,U! 
j=1 


Then 
g(X, Y) = g(x, E! REST x, E", y, U! EL y, U^). 


Using the linearity on the left, we find 
g(X, Y) = m x,g(E, yU TOn y, U^). 
Using the linearity on the right, we find 


AX, Y= Y Y xy U) 
Let 
a;; = g(E', U?). 
Then we see that 


n 


g(X, Y) — 2 2 Aj jXiYjs 
i j= 
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which is precisely the expression we obtained for the product 
'X AY, 


where A is the matrix (aj). This proves that g = g, for the choice of aj; 
given above. 


The uniqueness is also easy to see, and may be formulated as follows. 


Uniqueness. If A, B are m x n matrices such that for all vectors X, Y 
(of the appropriate dimension) we have 


'X AY ='XBY, 

then A — B. 

Proof. Since the above relation holds for all vectors X, Y, it holds in 
particular for the unit vectors. Thus we apply the relation when X = E' 
and Y = U?. Then the rule for multiplication of matrices shows that 

'E! A UJ z dij and ' E: BU? = bi;. 


Hence aj; = b; for all indices i, j. This shows that A = B. 


Remark. Bilinear maps can be added and multiplied by scalars. The 
sum of two bilinear maps is again bilinear, and the product by a scalar 
is again bilinear. Hence bilinear maps form a vector space. Verify the 
rules 


QA«B— 6A - gg and gea = cg,. 
Then Theorem 3.1 can be expressed by saying that the association 
Arg, 


is an isomorphism between the space of m x n matrices, and the space of 
bilinear maps from R” x R” into R. 


Application to calculus. If you have had the calculus of several vari- 
ables, you have associated with a function f of n variables the matrix of 


second partial derivatives 
0?f 
Ox; ôx; 
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This matrix may be viewed as the matrix associated with a bilinear map, 
which is called the Hessian. Note that this matrix is symmetric since it 
is proved that for sufficiently smooth functions, the partials commute, 
that is 


f of 


ôx; ôx; Ox; ôx; 


Exercises VI, §3 


1. Let A be n x n matrix, and assume that A is symmetric, ie. A ='A. Let 
Q,:R" x R” >R be its associated bilinear map. Show that 


g.(X, Y) = g4(Y, X) 


for all X, Y e R", and thus that g, is a scalar product, i.e. satisfies conditions 
SP 1, SP2, and SP3. 


2. Conversely, assume that A is an n x n matrix such that 
ga(X, Y) = g4(Y, X) 


for all X, Y. Show that A is symmetric. 


3. Write out in full in terms of coordinates the expression for 'X AY when A is 
the following matrix, and X, Y are vectors of the corresponding dimension. 


2 —3 4 1 —5 2 
ot e(1) (2 3 


"E =i sf m d - 25 
(d|-3 1 4 .e)| 3 1 1 0| 1 23 4 
2. 5 2 2 5 7 =f 3 


CHAPTER VII 


Determinants 


We have worked with vectors for some time, and we have often felt the 
need of a method to determine when vectors are linearly independent. 
Up to now, the only method available to us was to solve a system of 
linear equations by the elimination method. In this chapter, we shall 
exhibit a very efficient computational method to solve linear equations, 
and determine when vectors are linearly independent. 

The cases of 2 x 2 and 3 x 3 determinants will be carried out separa- 
tely in full, because the general case of n x n determinants involves nota- 
tion which adds to the difficulties of understanding determinants. Some 
proofs in the n x n case will be omitted. 


VII, S1. Determinants of Order 2 


Before stating the general properties of an arbitrary determinant, we shall 


consider a special case. 
PS a b 
| ^c d 


Let 
be a 2 x 2 matrix. We define its determinant to be ad — bc. Thus the 
determinant is a number. We denote it by 


a b 
c d 


|= ad — be 
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For example, the determinant of the matrix 


(i 4) 


is equal to 2.4 — 1-1 — 7. The determinant of 


—2 —3 
4 3 
is equal to (—2)-5 —(—3)-4= —10 + 12-2 2. 
The determinant can be viewed as a function of the matrix A. It can 


also be viewed as a function of its two columns. Let these be A! and A? 
as usual. Then we write the determinant as 


D(A),  Det(4, or D(A’, A?) 


The following properties are easily verified by direct computation, 
which you should carry out completely. 


Property 1. As a function of the column vectors, the determinant is 
linear. 


This means: suppose for instance A4! =C + C' is a sum of two 
columns. Then 


D(C + C', A?) = D(C, A?) + D(C’, A?) 
If x is a number, then 
D(xA!, A?) = xD(A!, A?). 


A similar formula holds with respect to the second variable. The formula 
can be proved directly from the definition of the determinant. For 
instance, let b', d' be two numbers. Then 


a b+b 
Det = + d’) — Tb 
e ( 1 Ü a(d + d') — c(b + b’) 


= ad + ad’ — cb — cb’ 


= ad — bc + ad' — b'c 


a b a b 
oaf $) +o 8) 
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Furthermore, if x is a number, then 


De A = xad — xbc = x(ad — bc) = x De® J 
xc d c d 


In the terminology of Chapter VI, §4 we may say that the determinant is 
bilinear. 


Property 2. If the two columns are equal, then the determinant is equal 
to 0. 


This is immediate, since by hypothesis, the determinant is ab — ab = 0. 
Property 3. If I is the unit matrix, I = (E+, E?), then 
D(I) = D(E!, E?) = 1. 


Again this is immediate from the definition ad — bc. 
Using only the above three properties we can prove others as follows. 


If one adds a scalar multiple of one column to the other, then the value 
of the determinant does not change. 


In other words, let x be a number. Then 
D(A} + xA?, A?) = D(A}, A?). 
The proof is immediate, namely: 


D(A! + xA?, A?) = D(A}, A?) + xD(A?, A?) by linearity 
= D(A’, A?) by Property 2. 


If the two columns are interchanged, then the determinant changes by a 
sign. 


In other words, we have D(A?, A!) = — D(A’, A’), or writing out the 


components, 
a b b a 
pe( 4 = E J 
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One can of course prove this directly from the formula ad — bc. But let 
us also derive it from the property that if the two columns are equal, 
then the determinant is 0. We have: 


0 = D(A! + A?, A! + A?) (because each variable is equal to A! + A?) 
= D(A!, A! + A?) + D(A?, A! + A?) (by linearity in the first variable) 


= D(A!, A!) + D(A!, A?) + D(A?, A!) + D(A’, A?) (by linearity in the 
second variable) 


= D(A}, A?) + D(A’, A?). 


Thus we see that D(A?, A!) = — D(A!, A?). Observe that this proof used 
only the linearity in each variable, and the fact that 


D(C, C) 20 
if C is a vector. 


The determinant of A is equal to the determinant of its transpose, i.e. 
D(A) = D('A). 


Explicitly, we have 


a b a c 
Det — Det 
«t 7n a) 
This formula comes from the formula ad — bc for the determinant. 


The vectors At, A? are linearly dependent if and only if the determinant 
is Q. 


We shall give a proof which follows the same pattern as in the gener- 
alization to higher dimensional spaces. First suppose that A!, A? are 
linearly dependent, so there is a linear relation 

xA! + yA? 20 
with not both x, y equal to 0. Say x #0. Then we can solve 
AT = A+ where z= —y/x. 


Now we have 


D(A}, A?) = D(zA?, A?) = zD(A?, A?) =0 
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by using linearity and the property that if the two columns are equal, 
then the determinant is 0. 

Conversely, suppose that A‘, A? are linearly independent. Then they 
must form a basis of R*, which has dimension 2. Hence we can express 
the unit vectors E!, E? as linear combinations of A!, A?, say 


E! = xA! + yA? and E? = zA! + wA?, 

where x, y, z, w are scalars. Now we have: 

1 = D(E!, E?) = D(xA! + yA?, zA! + wA?) 
= xzD(A!, A!) + xwD(A!, A?) + yzD(A?, A!) + ywD(A?, A?) 
= (xw — yz)D(A!, A?). 


Since this last product is 1, we must have D(A!, A?) 4 0. This proves the 
desired assertion. 


Finally we prove the uniqueness of the determinant, by a method 
which will work in general; 


Theorem 1.1. Let o be a function of two vector variables A‘, A? eR? 
such that: 


Q is bilinear, that is q is linear in each variable. 


q(A!, A!) 2 0 for all A’ eR?. 
1 p2 £6 pl p2 . l 0 
qQ(E, E^) 21 if E, E^ are the standard unit vectors oh uk 


Then q(A!, A?) is the determinant. 


Proof. Write 
A! = aE! + cE? and A? = bE! + dE’. 
Then 
(A!, A?) = (aE! + cE?, bE! + dE?) 
— abo(E!, E!) + adq(E!, E?) + cbo( E?, E!) + cdg(E?, E?) 
= ade (E!, E?) — bcq(E!, E?) 


(ad — bc)g(E!, E?) 
ad — bc. 
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At each step we have used one of the properties proved previously. This 
proves the theorem. 


VII, 82. 3 x 3 and n x n Determinants 


We shall define determinants by induction, and give a formula for com- 
puting them at the same time. We deal with the 3 x 3 case. 
We have already defined 2 x 2 determinants. Let 


aii 012 443 
A = (a;j) = | 421 422 423 


Q3; 432 433 


be a 3 x 3 matrix. We define its determinant according to the formula 
known as the expansion by a row, say the first row. That is, we define 


a a a a 


a32 433 a31 032 


We may describe this sum as follows. Let A;; be the matrix obtained 
from A4 by deleting the i-th row and the j-th column. Then the sum 
expressing Det(A) can be written 


41, Det(A,,) — a,2 Det(4,5) + a,, Det(A,,). 
In other words, each term consists of the product of an element of the 
first row and the determinant of the 2 x 2 matrix obtained by deleting 
the first row and the j-th column, and putting the appropriate sign to 


this term as shown. 


Example 1. Let 


Then 
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and our formula for the determinant of A yields 


1 4 1 4 1 1 
Det(A) = > — | | 


0 

2 5 = j^ E 2 
2(5 — 8) — 1(5 + 12) +0 
EU 


The determinant of a 3 x 3 matrix can be written as 
D(A) = Det(A) = D(A!, A?, A?). 


We use this last expression if we wish to consider the determinant as a 
function of the columns of A. 

Furthermore, there is no particular reason why we selected the expan- 
sion according to the first row. We can also use the second row, and 
write a similar sum, namely: 


di2 443 aii 043 Qi, 012 
— 054 + 455 — 053 
a32 033 a31 33 a31 432 


= — d21 Det(A;,) T 5^5 Det(A,;) unm 54 Det(A,3). 


Again, each term is the product of a,,, the determinant of the 2 x 2 
matrix obtained by deleting the second row and j-th column, and putting 
the appropriate sign in front of each term. This sign is determined 
according to the pattern: 

+ — + 


One can see directly that the determinant can be expanded according to 
any row by multiplying out all the terms, and expanding the 2 x 2 deter- 
minants, thus obtaining the determinant as an alternating sum of six 
terms: 


(*) Det(A) = 4; 1472433 — 411432423 — 412421433 + 412423431 


T 413051032 — 413422431- 


We can also expand according to columns following the same principle. 
For instance, the expansion according to the first column: 


ü12 013 di2 443 


+ a3, 


a22 0253 


yields precisely the same six terms as in (*#). 
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In the case of 3 x 3 determinants, we therefore have the following 
result. 


Theorem 2.1. The determinant satisfies the rule for expansion according 
to rows and columns, and Det(A) = Det(‘A). In other words, the deter- 


minant of a matrix is equal to the determinant of its transpose. 


Example 2. Compute the determinant 


eN © 
N U = 


by expanding according to the second column. 
The determinant is equal to 
3 1 3 1 
2 — 4 = 2(6 —(—1)) — 4(15 — 1) = —42. 
tal 4 gf =26- Cm aas— D 
Note that the presence of a 0 in the second column eliminates one term 
in the expansion, since this term would be 0. 
We can also compute the above determinant by expanding according 
to the third column, namely the determinant is equal to 


1 2 3 0 3 0 
+1 fae +2 


eia "zr j--4 


Next, let A = (aj) be an arbitrary nxn matrix. Let A;; be the 
(n— 1) x (n — 1) matrix obtained by deleting the i-th row and j-th 
column from A. 


Aii 012 "7" Aij Qin 
Aij = dij 
an1 an2 Anj Ann 


We give an expression for the determinant of an n xn matrix in 
terms of determinants of (n — 1) x (n — 1) matrices. Let i be an integer, 
1<is<n. We define 


This sum can be described in words. For each element of the i-th 
row, we have a contribution of one term in the sum. This term is equal 
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to + or — the product of this element, times the determinant of the 
matrix obtained from A by deleting the i-th row and the corresponding 
column. The sign + or — is determined according to the chess-board 
pattern: 


+ — + = 
— + — + 
+ — + = 


This sum is called the expansion of the determinant according to the i-th 
row. 

Using more complicated notation which we omit in this book, one 
can show that Theorem 2.1 is also valid in the n x n case. In particular, 
the determinant satisfies the rule of expansion according to the j-th col- 
umn, for any j. Thus we have the expansion formula: 


In practice, the computation of a determinant is always done by using 
an expansion according to some row or column. 

We use the same notation for the determinant in the n x n case that 
we used in the 2 x 2 or 3 x 3 cases, namely 


|A| = D(A) = Det(A) = D(A!,...,A"). 


The notation D(A!,...,4") is especially suited to denote the determinant 
as a function of the columns, for instance to state the next theorem. 
Theorem 2.2. The determinant satisfies the following properties: 


1. As a function of each column vector, the determinant is linear, ie. if 


the j-th column A’ is equal to a sum of two column vectors, say 
A! = C + C', then 


DU A, 6 FC esl") 
EDU S gest nied SESDUA Seas Costi). 
Furthermore, if x is a number, then 
DAP XM su S DU a ce. 
2. If two columns are equal, i.e. if A? = A*, with j #k, then the deter- 


minant D(A) is equal to 0. 
3. If I is the unit matrix, then D(I) = 1. 
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The proof requires more complicated notation and will be omitted. It 
can be carried out by induction, and from the explicit formula giving the 
expansion of the determinant. 

As an example we give the proof in the case of 3 x 3 determinants. 
The proof is by direct computation. Suppose say that the first column is 
a sum of two columns: 


011 b, Cy 
A} = B + Cc that 1S, a>; = b, + C5 " 
03, b, C3 


Substituting in each term of (x), we see that each term splits into a sum 
of two terms corresponding to B and C. For instance, 


a22 053 455 423 a22 053 
ai, =b, +C; ; 
a32 433 035 033 a32 033 
btc, dj b, 455 C2 53 
di2 b = 015 b T 415 ^ 
3 + C3 033 3 433 3 033 


and similarly for the third term. The proof with respect to the other 
column is analogous. Furthermore, if x is a number, then 


a22 053 Xü54, 053 Xü5; 053 
Det(xA!, A?, A?) = xa,, — + d,4 
a32 33 Xü34, 433 a31 432 
EX Det(A!, A?, A?) 


Next, suppose that two columns are equal, for instance the first and 
second, so A! = A’. Thus 


à, = 015, 5; = 055, 431 = 035. 


Then again you can see directly that terms will cancel to make the 
determinant equal to 0. 

Finally, if J is the unit 3 x 3 matrix, then Det(J)=1 by using the 
expansion according to either rows or columns, because in such an 
expansion all but one term are equal to 0, and this single term is equal 
to 1 times the determinant of the unit 2 x 2 matrix, which is also equal 
to 1. 


A function of several variables which is linear in each variable, i.e. 
which satisfies the first property of determinants, is called multilinear. A 
function which satisfies the second property is called alternating. 


To compute determinants efficiently, we need additional properties 
which will be deduced simply from properties 1, 2, 3 of Theorem 2.2. 
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4. Let j, k be integers with 1 Sj <n and 1 Ek € n, and jÆ k. If the 
j-th column and k-th column are interchanged, then the determinant 
changes by a sign. 


Proof. In the matrix A be replace the j-th column and the k-th 
column by A + A*. We obtain a matrix with two equal columns, so by 
property 2, the determinant is 0. We expand by property 1 to get: 


0 = D(...,Ai + A*,..., A) + AP...) 
= D(...,A4,...,A4,...) + D(...,A44,...,4%...) 
+ D(...,A*,...,44,...) + DG. AS... AS..). 


Using property 2 again, we see that two of these four terms are equal to 
0, and hence that 


0 = D(...,A4,...,A*,...) + D(...,A*,...,44...). 


In this last sum, one term must be equal to minus the other, thus prov- 
ing property 4. 


5. If one adds a scalar multiple of one column to another then the value 
of the determinant does not change. 


Proof. Consider two different columns, say the k-th and j-th columns 
A* and AŻ with k z j. Let x be a scalar. We add xA/ to A*. By prop- 
erty 1, the determinant becomes 


D(...,A* + xA...) = DC..,A5,...) + DC...xA,,...) 


T T T 
k k k 


(the k points to the k-th column). In both terms on the right, the indi- 
cated column occurs in the k-th place. But D(...,4*,...) is simply D(A). 
Furthermore, 
D(...,xA4,...) = xD(..,A),...). 
t T 
k k 


Since k # j, the determinant on the right has two equal columns, because 
A’ occurs in the k-th place and also in the j-th place. Hence it is equal 
to 0. Hence 

D(...,A* + xA.. >) = D(...,A*,...), 


thereby proving our property 5. 
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Since the determinant of a matrix is equal to the determinant of its 
transpose, that is Det(A) = Det(‘A), we obtain the following general fact: 


All the properties stated above for rows or column operations are valid 
for both row and column operations. 


For instance, if a scalar multiple of one row is added to another row, 
then the value of the determinant does not change. 

With the above means at our disposal, we can now compute 3 x 3 
determinants very efficiently. In doing so, we apply the operations de- 
scribed in property 5. We try to make as many entries in the matrix A 
equal to 0. We try especially to make all but one element of a column 
(or row) equal to 0, and then expand according to that column (or row). 
The expansion will contain only one term, and reduces our computation 
to a 2 x 2 determinant. 


Example 3. Compute the determinant 


3 0 1 
1 2 5 
—1 4 2 


We already have 0 in the first row. We subtract twice the second row 
from the third row. Our determinant is then equal to 


3 0 1 
1 2 5 
—3 0 —8 


We expand according to the second column. The expansion has only 
one term #0, with a + sign, and that is: 

3 1 

2 i 

-3 -s 


The 2 x 2 determinant can be evaluated by our definition ad — bc, and 
we find 2(—24 — (—3)) = — 42. 

Similarly, we reduce the computation of a 4 x 4 determinant to that 
of 3 x 3 determinants, and then 2 x 2 determinants. 


Example 4. We wish to compute the determinant 


1 3 1 1 
2 1 5 2 
1 —1 2 3| 
4 [ 3 7 
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We add the third row to the second row, and then add the third row to 
the fourth row. This yields 


We then add three times the third row to the first row and get 


A e U m 
| 
mÓ O 
Ww N N ~- 


4 0 
3 0 
1 —1 
5 0 


i] {1 
5  |3 
3 hn = 
7 |S 


or O WwW 


1 
7 
2 
—1 


1 
5 
RI 


10 


which we expand according to the second column. 
one term, namely 


There is only 


We subtract twice the second row from the first row, and then from the 
third row, yielding 


which we expand according to the third column, and get 


NEST. 
3 7 
-1 -15 


22 5(30: 7) = —5(23) = 


Exercises VII, §2 


1. Compute the following determinants. 


2 
(a) |0 


(d) 10 


N ON N 


2 3 
-1 (b) | 21 
l =) 
E = 
zd (e) | 4 
7 2 


- QOO t A N = 


oo O U9 U — tA 


— 115. 


NUA 
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2. Compute the following determinants. 


GE EH NE "AE 2 © ; 
0 1 1 3 "E NS 4 
2 5 5 
| 1. 1 of 9]|o 4 1 2| © f 3 
3 1 2 5 $3. fF dS 7 
4 -9 2 4 —1 1 2 0 0 
(l4 -9 2 () |2 0 0 1 1 0 
3 1! 0 1 5 7 8 5 7 
4 0 0 s 0 0 jo ey d 
(p|0 1 0 ()|0 3 0 (i) |3 1 5 
0 0 27 0 0 9 k 2 3 


3. In general, what is the determinant of a diagonal matrix 


a, 0 0. O0 
0 a, O> O0 
| E 
0 0 0 
0 0 0 a 


cosÜ  —sin0 
4. Compute the determinant 


sin 0 cos 0 


5. (a) Let x,, x,, x4 be numbers. Show that 


2 
] x, xj 
2 
| x; xij = (x2 — xi(xa — xi) — x5). 
D xx 


*(b) If x,,...,x, are numbers, then show by induction that 


n-1 

E.X x 
n-1] . 

] x + x = [[G;-x) 
n-1 i<j 

bL x x, 


the symbol on the right meaning that it is the product of all terms 
x;— x, with i<j and i, j integers from 1 to n. This determinant is 
called the Vandermonde determinant V,. To do the induction easily, 
multiply each column by x, and subtract it from the next column on 
the right, starting from the right-hand side. You will find that 


V, = (x, = Xi): (x2 = x1)V,-,- 


[VII, 82] 


3 x 3 AND n x n DETERMINANTS 


6. Find the determinants of the following matrices. 


1 2 5 
aio 1 7 
0 0 3 
2 —6 9 
(o 1 4 
0 0 8 
1 4 6 
elo o 1 
0 0 8 
| 5 2 
mio 2 
Slo o 4 
0 0 0 


(b) 0 


On — QN w 
N 


96 


5 


NANO 


w= O © 


— O OC © 
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(1) Let A be a triangular n x n matrix, say a matrix such that all compo- 
nents below the diagonal are equal to 0. 


What is D(A)? 


7. If a(t), b(t), c(t), d(t) are functions of t, one can form the determinant 


just as with numbers. Write out in full the determinant 


a(t) 
c(t) 


sin t 
—cost 


8. Write out in full the determinant 


t+ 1 
t 


b(t) 
d(t) 


5 


cos t 
sin t 


t— 1 
2L-E-» 


210 DETERMINANTS [ VII, $3] 


9. (With calculus) Let f(t) g(t) be two functions having derivatives of all 


11. 


12. 


13. 


orders. Let g(t) be the function obtained by taking the determinant 


 ]f() g6) 
€ — le) gl 
Show that 
4. MO. lt) 
Oir) sol 


i.e. the derivative 1s obtained by taking the derivative of the bottom row. 


(bO ck) 
t = (rn. m 


. (With calculus). Let 


be a 2 x 2 matrix of differentiable functions. Let B(t) and C(t) be its column 
vectors. Let 


p(t) = Det(A(t)). 
Show that 


p'(t) = D(B'(t), C(t)) + D(B(t), C'(t)). 
Let c be a number and let A be a 3 x 3 matrix. Show that 
D(cA) = c?D(A). 
Let c be a number and let A be an n x n matrix. Show that 
D(cA) = c"D(A). 
Let c,,...,c, be numbers. How do the determinants differ: 


D(c,A!,...,c,A") and LD(A...,4^? 


. Write down explicitly the expansion of a 4 x 4 determinant according to the 


first row and according to the first column. 


VII, 83. The Rank of a Matrix and Subdeterminants 


In this section we give a test for linear independence by using determin- 
ants. 


Theorem 3.1. Let A’,...,A" be column vectors of dimension n. They are 
linearly dependent if and only if 


D(A},...,A") = 0. 
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Proof. Suppose A!l,...,4" are linearly dependent, so there exists a 
relation 
xA! +---+x,A"=O 


with numbers x,,...,x, not all 0. Say x; #0. Subtracting and dividing 
by x;, we can find numbers c, with k #j such that 


A= y cA". 


k*j 


Thus 


k*j 


D(A) — (4^... Y 2) 


zy DAL A cessum 


k*j 


where A* occurs in the j-th place. But A* also occurs in the k-th place, 
and k zj. Hence the determinant is equal to 0 by property 2. This 
concludes the proof of the first part. 

As to the converse, we recall that a matrix is row equivalent to an 
echelon matrix. Suppose that A!,...,4" are linearly independent. Then 
the matrix 


A — (Al,... , A") 


is row equivalent to a triangular matrix. Indeed, it is row equivalent to 
a matrix B in echelon form 


by, bi Din 
0 bj Dan 
0 0 b 


and the operations of row equivalence do not change the property of 
rows or columns being linearly independent. Hence all the diagonal ele- 
ments b,,,...,b,, are #0. The determinant of this matrix is the product 


bi eb FO 


by the rule of expansion according to columns. Under the operations of 
row equivalence, the property of the determinant being #0 does not 
change, because row equivalences involve multiplying a row by a non- 
zero scalar which multiplies the determinant by this scalar; or inter- 
changing rows, which multiplies the determinant by —1; or adding a 
multiple of one row to another, which does not change the value of the 
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determinant. Since Det(B) #0 it follows that Det(A) #0. This con- 
cludes the proof. 


Corollary 3.2. If 4A!,...,4" are column vectors of R" such that 
D(A!,...,A") #0, and if B is a column vector, then there exist numbers 
X4, ...,X, Such that 


x, A! t + x,A" = B. 

These numbers are uniquely determined by B. 

Proof. According to the theorem, A!,...,4" are linearly independent, 
and hence form a basis of R". Hence any vector of R" can be written as 
a linear combination of A!,...,4". Since A!,...,4" are linearly indepen- 
dent, the numbers x,,...,x, are unique. 


In terms of linear equations, this corollary shows: 


If a system of n linear equations in n unknowns has a matrix of coeffi- 
cients whose determinant is not 0, then this system has a unique solution. 


Since determinants can be used to test linear independence, they can 
be used to determine the rank of a matrix. 


Example 1. Let 


3 ] 2 5 
A-|1 2- vw 2 |. 
1 1 0 ] 


This is a 3x 4 matrix. Its rank is at most 3. If we can find three 
linearly independent columns, then we know that its rank is exactly 3. 
But the determinant 


l 5 
2 —1 2-4 
1 0 1! 


is not equal to 0. Hence rank A = 3. 
It may be that in a 3 x 4 matrix, some determinant of a 3 x 3 sub- 
matrix is 0, but the 3 x 4 matrix has rank 3. For instance, let 
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The determinant of the first three columns 


3 1 2 
2 =l 
4 3 l 


is equal to O (in fact, the last row is the sum of the first two rows). But 
the determinant 


l 2 5 
2 —1 2 
3 l l 


is not zero (what is it?) so that again the rank of B is equal to 3. 
If the rank of a 3 x 4 matrix 


is 2 or less, then the determinant of every 3 x 3 submatrix must be 0, 
otherwise we could argue as above to get three linearly independent 
columns. We note that there are four such subdeterminants, obtained by 
eliminating successively any one of the four columns. Conversely, if 
every such subdeterminant of every 3 x 3 submatrix is equal to 0, then it 
Is easy to see that the rank is at most 2. Because if the rank were equal 
to 3, then there would be three linearly independent columns, and their 
determinant would not be 0. Thus we can compute such subdetermin- 
ants to get an estimate on the rank, and then use trial and error, and 
some judgment, to get the exact rank. 


Example 2. Let 


If we compute every 3 x 3 subdeterminant, we shall find 0. Hence the 
rank of C is at most equal to 2. However, the first two rows are linearly 
independent, for instance because the determinant 
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is not equal to 0. It is the determinant of the first two columns of the 
2 x 4 matrix 


3 1 2 3 
1 25 =l 2) 


Hence the rank is equal to 2. 


Of course, if we notice that the last row of C is equal to the sum of 


the first two, then we see at once that the rank is < 2. 


Exercises VII, §3 


1. 


Compute the ranks of the following matrices. 


b = & s 3 5 Xo 4 

( PER d 2|2 -1 1 1| 

5 4 2 5 

3 5 1 4 3 5 l 4 

f2<1 1 1 4[2 -1 1 1 
: 9 3 9 : 1 2 j 
-1 1 6 5 2 1 6 6 
| 1 1 2 3 ( 3 4d T. 
di 2 5 4 Ss 2. a 6 
2- 1 0 4 -2 4 3 2 
2 1 6 6 3 1 1 -1 
3 1 1 -1 2 4. 3» 3 
Ws. 2 7 5 "lon d$ m$ 3 
8 3 8 4 T w. € d 


. (With calculus). Let a,,...,0, be distinct numbers, #0. Show that the func- 


tions 


are linearly independent over the numbers. [Hint: Suppose we have a linear 
relation 


cet +---+c,e" =0 


with constants c;, valid for all t. If not all c; are 0, without loss of generality, 
we may assume that none of them is 0. Differentiate the above relation n — 1 
times. You get a system of linear equations. The determinant of its 
coefficients must be zero. (Why?) Get a contradiction from this. | 


VII, $4. Cramer's Rule 


The properties of determinants can be used to prove a well-known rule 
used in solving linear equations. 
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Theorem 4.1. Let A!,...,4" be column vectors such that 
D(A!,...,A") # 0. 
Let B be a column vector. If x,,...,X, are numbers such that 
xA +---+x,A" = B, 
then for each j = 1,...,n we have 


_ D(A’,...,B,...,A") 
J" D(AN...,A,) 


X 


where B occurs in the j-th column instead of A’. In other words, 


1 b, Ain 
a21 b; Arp 
Ant b, Ann 
aii ü,j Ain 
Ar, ad; Arn 
An} v5. anj Sas Ann 


(The numerator is obtained from A by replacing the j-th column A? by 
B. The denominator is the determinant of the matrix A.) 


Theorem 4.1 gives us an explicit way of finding the coordinates of B 
with respect to 4!,...,4". In the language of linear equations, Theorem 
4.1 allows us to solve explicitly in terms of determinants the system of n 
linear equations in n unknowns: 


Xaia boo + Xpan = b, 


X10, queo Xnann = b,- 


We prove Theorem 4.1. 

Let B be written as in the statement of the theorem, and consider the 
determinant of the matrix obtained by replacing the j-th column A by B. 
Then 


D(A!,...,B,..., A") = D(A},...,x,A1 +- + x, A],... , A"). 


216 DETERMINANTS [ VII, $4] 


We use property 1 and obtain a sum: 


D(A},...,x,A},...,A") +++: + D(A},...,x;A4,...,4") 


J 


+--+ + D(A’,...,x,A",...,A"), 
which by property 1 again, is equal to 


x,D(A',...,A*,...,A") + +++ + x;D(A’,...,A”) 
+--+ xD(A!,...,A",...,A"). 
In every term of this sum except the j-th term, two column vectors are 


equal. Hence every term except the j-th term is equal to 0, by property 
2. The j-th term is equal to 


x ,D(A},...,A"), 


and is therefore equal to the determinant we started with, namely 
D(A’,...,B,...,A"). We can solve for x;, and obtain precisely the expres- 
sion given in the statement of the theorem. 

The rule of Theorem 4.1 giving us the solution to the system of linear 
equations by means of determinants, is known as Cramer’s rule. 


Example. Solve the system of linear equations: 
3x + 2y + 4z = 1, 


2X — y+ z=), 
x t 2y + 3z=1. 


We have: 
E 4 3 1 4 3 23 4 
PS 1 2 0 1 2 —1 0 
WE NE. 1 1 3 | 2 41 
"UM x 4] € 3 - s T7419" 9 3l 
2-1 1 i ef d "EE MEE 
NE 3 |o 2 5 D 2 35 


Observe how the column 
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shifts from the first column when solving for x, to the second column 
when solving for y, to the third column when solving for z. The deno- 
minator in all three expressions 1s the same, namely it is the determinant 
of the matrix of coefficients of the equations. 

We know how to compute 3 x 3 determinants, and we then find 


Uto 


X == —35, y=0, Z = 


Exercise VII, 84 


1. Solve the following systems of linear equations. 


(a) 3x + y—z=0 (b 2x — y+ z=1 
x+y+z=0 x + 3y —2z=0 
y-z=1 4x —3y+ z=2 
(c) 4x+y+ z+ w=1 (d x+2y—3z+5w=0 
x—y+t+2z-—3w=0 2x+ y—4z— w=1 
2x+y+3z+5w=0 x+ y+ z+ w=0 
Xx+y- z— w=2 —x— y— z+ w=4 


VII, §5. Inverse of a Matrix 


We consider first a special case. Let 


ee a b 
| Ac d 
be a 2 x 2 matrix, and assume that its determinant ad — bc #0. We 
wish to find an inverse for A, that 1s a 2 x 2 matrix 


"-) 


AX = XA-— I. 


such that 


Let us look at the first requirement, AX = I, which written out in full, 


looks like this: 
a by x y\ /1 0 
c dAz w) WO 1/ 


Let us look at the first column of AX. We must solve the equations 
ax + bz 1, 
cx + dz =Q. 
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This is a system of two equations in two unknowns, x and z, which we 
know how to solve. Similarly, looking at the second column, we see that 
we must solve a system of two equations in the unknowns y, w, namely 


ay + bw = 0, 
cy+dw=1. 


Ae 2 1 
|» 44 3f 
We seek a matrix X such that AX = I. We must therefore solve the 
systems of linear equations 


Example. Let 


2x - z=l, 2y+ w=0, 
an 
4x + 3z = 0, 4y + 3àw — 1. 


By the ordinary method of solving two equations in two unknowns, we 
find 
25.312. LE LA and y= —3, w= I. 


uc b T 
—2 ] 
is such that AX = I. The reader will also verify by direct multiplication 
that XA — I. This solves for the desired inverse. 

Similarly, in the 3 x 3 case, we would find three systems of linear 
equations, corresponding to the first column, the second column, and the 
third column. Each system could be solved to yield the inverse. We 
shall now give the general argument. 

Let A be an n x n matrix. If B is a matrix such that AB = I and 
BA = I (I = unit n x n matrix), then we called B an inverse of A, and we 
write B= A !. If there exists an inverse of A, then it is unique. Indeed, 
let C be an inverse of A. Then CA = I. Multiplying by B on the right, 
we obtain CAB = B. But CAB = C(AB) - CI = C. Hence C=B. A 
similar argument works for AC = I. 


Thus the matrix 


Theorem 5.1. Let A —(aj) be an nx n matrix, and assume that 
D(A) #0. Then A is invertible. Let E? be the j-th column unit vector, 
and let 
D(A!, ... ,E),...,A") 
ij = D(A) , 
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where E! occurs in the i-th place. Then the matrix B = (bij) is an 
inverse for A. 


Proof. Let X = (xj) be an unknown n x n matrix. We wish to solve 
for the components x;;, so that they satisfy AX — I. From the definition 
of products of matrices, this means that for each j, we must solve 


E! Te e 


This is a system of linear equations, which can be solved uniquely by 
Cramer’s rule, and we obtain 

DA aa, ic) 
ij D(A) , 


X 


which is the formula given in the theorem. 

We must still prove that XA = I. Note that D(A) z 0. Hence by 
what we have already proved, we can find a matrix Y such that 'AY = I. 
Taking transposes, we obtain 'Y A = I. Now we have 


I 2'Y(AX)A ='YA(XA) = XA, 
thereby proving what we want, namely that X = B is an inverse for A. 


We can write out the components of the matrix B in Theorem 5.1 as 
follows: 


dii eee 0 eee Ain 

dj ] Q jn 

bcm Ani 0 Ann 
aa Det(A) 


If we expand the determinant in the numerator according to the i-th 

column, then all terms but one are equal to 0, and hence we obtain the 

numerator of b;; as a subdeterminant of Det(A). Let A;; be the matrix 

obtained from A by deleting the i-th row and the j-th column. Then 
Te (—1)'*/Det(A jj) 

7  Det(A) 


(note the reversal of indices!) and thus we have the formula 


) 


(—1)'*/ Det(A;,) 
Det(A) 


A^! = transpose of | 
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A square matrix whose determinant is #0, or equivalently which 
admits an inverse, is called non-singular. 


Example. Find the inverse of the matrix 


3 ] — 
Az-|-1 1 
1 —2 ] 


By the formula, we have 


t47! = (—1)7 Det(A;)) 
-( Det(A) ) 


For i—-1,j-1 the matrix A,, is obtained by deleting the first row 
and first column, that is 


and Det(4,,) 2-1—(—4)- 5. 


For i= 1, j= 2, the matrtix A,, is obtained by deleting the first row 
and second column, that is 


—1 2 
Ai = 1 1 


For i= 1, j= 3, the matrix A,4 is obtained by deleting the first row 


and third column, that is 
pa l 
L3 am 1 29 
and Det(4,,) 22— 1 — I. 
We can compute Det(A) = 16. Then the first row of 'A^! is 


is(5, 3, 1). 


Therefore the first column of A^! is 


sl- 
(oN 


5 
3 |. 
] 


Observe the sign changes due to the sign pattern ( — 1) 7. 
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We leave the computation of the other columns of A^! to the reader. 
We shall assume without proof: 


Theorem 5.2. For any two n x n matrices A, B the determinant of the 
product is equal to the product of the determinants, that is 


Det(AB) = Det(A) Det(B). 


Then as a special case, we find that for an invertible matrix A, 


Det(A ^!) = Det(A)  !. 


Indeed, we have 


AA \=I, 
and applying the rule for a product, we get 
D(A)D(A~*) = 1, 


thus proving the formula for the inverse. 


Exercises VI, §5 


1. Using determinants, find the inverses of the matrices in Chapter II, $5. 
2. Write down explicitly the inverse of a 2 x 2 matrix 


e s) 


assuming that ad — bc #0. 


VII, S6. Determinants as Area and Volume 


It is remarkable that the determinant has an interpretation as a volume. 
We discuss first the 2-dimensional case, and thus speak of area, although 
we write Vol for the area of a 2-dimensional figure, to keep the termino- 
logy which generalizes to higher dimensions. 

Consider the parallelogram spanned by two vectors v, w. 

By definition, this parallelogram is the set of all linear combinations 


tU +t w with 0<4<l. 
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Figure 1 


We view v, w as column vectors, and can thus form their determinant 
D(v, w). This determinant may be positive or negative since. 


D(v, w) = —D(w, v). 


Thus the determinant itself cannot be the area of this parallelogram, 
since area is always = 0. However, we shall prove: 


Theorem 6.1. The area of the parallelogram spanned by v, w is equal to 
the absolute value of the determinant, namely |D(v, w)|. 


To prove Theorem 6.1, we introduce the notion of oriented area. Let 
P(v,w) be the parallelogram spanned by v and w. We denote by 
Volo(v, w) the area of P(v, w) if the determinant D(v, w) = 0, and minus 
the area of P(v,w) if the determinant D(v,w) « O. Thus at least 
Vol,(v, w) has the same sign as the determinant, and we call Volo(v, w) 
the oriented area. We denote by Vol(v, w) the area of the parallelogram 
spanned by v, w. Hence Volo(v, wy — + Vol(v, w). 

To prove Theorem 6.1, it will suffice to prove: 


The oriented area is equal to the determinant. In other words, 
Volo(v, w) = D(v, w). 


Now to prove this, it will suffice to prove that Vol, satisfies the three 
properties characteristic of a determinant, namely: 


1. Vol, is linear in each variable v and w. 
2. Vole(v, v) = 0 for all v. 
3. Vol,(E', E?) = 1 if Et, E? are the standard unit vectors. 


We know that these three properties characterize determinants, and 
this was proved in Theorem 1.1. For the convenience of the reader, we 
repeat the argument here very briefly. We assume that we have a func- 
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tion g satisfying these three properties (with g replacing Volo). Then for 
any vectors 


v = aE! + cE? and w = bE! + dE? 


we have 


g(aE! + cE?, bE! + dE?) = abg(E!, E!) + adg(E!, E?) 
+ cbg( E?, E!) + cdg(E?, E?). 


The first and fourth term are equal to 0. By Exercise 1, 


g(E^, E!) = —g(E!, E?) 
and hence 
g(v, w) = (ad — bc)g(E!, E?) = ad — bc. 


This proves what we wanted. 

In order to prove that Vol, satisfies the three properties, we shall use 
simple properties of area (or volume) like the following: The area of a 
line segment is equal to 0. If A is a certain region, then the area of A is 
the same as the area of a translation of A, 1e. the same as the area of 
the region A, (consisting of all points v + w with ve A). If A, B are 
regions which are disjoint or such that their common points have area 
equal to 0, then 


Vol(A u B) = Vol(A) + Vol(B). 


Consider now Vols. The last two properties are obvious. Indeed, the 
parallelogram spanned by v, v 1s simply a line segment, and its 2-dimen- 
sional area is therefore equal to 0. Thus property 2 is satisfied. As for 
the third property, the parallelogram spanned by the unit vectors E!, E? 
is simply the unit square, whose area is 1. Hence in this case we have 


Vol,(E', E?) = 1. 

The harder property is the first. The reader who has not already done 
so, should now read the geometric applications of Chapter III, §2 before 
reading the rest of this proof, which we shall base on geometric consider- 
ations concerning area. 

We shall need a lemma. 

Lemma 6.2. If v, w are linearly dependent, then Vol )(v, w) = 0. 


Proof. Suppose that we can write 


av+ bw=0 
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Figure 2 


with a or b #0. Say a #0. Then 
b 


v=—- w=cw 
a 


so that v, w lie on the same straight line, and the parallelogram spanned 
by v, w is a line segment (Fig. 2). Hence Volo(v, w) = 0, thus proving the 
lemma. 

We also know that when v, w are linearly dependent, then D(v, w) = 0, 
so in this trivial case, our theorem is proved. In the subsequent lemmas, 
we assume that v, w are linearly independent. 


Lemma 6.3. Assume that v, w are linearly independent, and let n be a 
positive integer. Then 


Vol(nv, w) 2 n Vol(v, w). 


Proof. The parallelogram spanned by nv and w consists of n parallelo- 
grams as shown in the following picture. 


Figure 3 


These n parallelograms are simply the translations of P(v,w) by 
v, 20,..., (n — l)v, and each translation of P(v, w) has the same area 
as P(v,w) These translations have only line segments in common, 
and hence 

Vol(nv, w) 2 n Vol(v, w) 


as desired. 
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Corollary 6.4. Assume that v, w are linearly independent and let n be a 


positive integer. Then 


1 1 
ve( V, w) = — Vol(v, w). 
n n 


If m, n are positive integers, then 
m m 
vo(^ V, w) = — Vol(v, w). 
n n 


Proof. Let v, = (1/n)v. By the lemma, we know that 
Vol(nv,, w) =n Vol(v,, w). 


This is merely a reformulation of our first assertion, since nv, — v. As for 
the second assertion, we write m/n = m-1/n and apply the proved state- 


] 
oim aU; w) =m Vol( | V, w) 
n n 


1 
= m-— Vol(v, w) 
n 


ments successively: 


E Vol(v, w). 
n 


Lemma 6.5. Vol( — v, w) = Vol(v, w). 


Proof. The parallelogram spanned by —v and w is a translation by 
—v of the parallelogram P(v, w). Hence P(v, w) and P(— v, w) have the 


same area. (Cf. Fig. 4.) 


O 
Figure 5 


Figure 4 
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Lemma 6.6. If c is any real number > 0, then 
Vol(cv, w) = c Vol(v, w). 


Proof. Let r, r' be rational numbers such that O <r < c <r (Fig. 5). 
Then 
P(rv, w) c P(cv, w) c P(r'v, w). 
Hence by Lemma 6.3, 
r Vol(v, w) = Vol(rv, w) 
< Vol(cv, w) 
< Vol(r'v, w) 


= r' Vol(v, w). 
Letting r and r' approach c as a limit, we find that 


Vol(cv, w) = c Vol(v, w), 


as was to be shown. 


From Lemmas 6.5 and 6.6 we can now prove that 


Volo(cv, w) = c Volo(v, w) 


for any real number c, and any vectors v, w. Indeed, if v, w are linearly 
dependent, then both sides are equal to 0. If v, w are linearly indepen- 
dent, we use the definition of Voly and Lemmas 6.5, 6.6. Say D(v, w) > 0 


and c is negative, c = —d. Then D(cv, w) € 0 and consequently 
Volo(cv, w) = — Vol(cv, w) = — Vol( — dv, w) 
— — Vol(dv, w) 
— —d Vol(v, w) 


= c Vol(v, w) = c Volo(v, w). 


A similar argument works when D(v, w) € 0. We have therefore proved 
one of the conditions of linearity of the function Voly. The analogous 
property of course works on the other side, namely 


Volo(v, cw) = c Volg(v, w). 


For the other condition, we again have a lemma. 
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Lemma 6.7. Assume that v, w are linearly independent. Then 
Vol(v + w, w) = Vol(v, w). 


Proof. We have to prove that the parallelogram spanned by v, w has 
the same area as the parallelogram spanned by v + w, w. 


Figure 6 


The parallelogram spanned by v, w consists of two triangles A and B as 
shown in the picture. The parallelogram spanned by v + w and w con- 
sists of the triangles B and the translation of A by w. Since A and 
A +w have the same area, we get: 


Vol(v, w) = Vol(A) + Vol(B) = Vol(A + w) + Vol(B) = Vol(v + w, w), 
as was to be shown. 

We are now in a position to deal with the second property of linear- 
ity. Let w be a fixed non-zero vector in the plane, and let v be a vector 
such that {v,w} is a basis of the plane. We shall prove that for any 
numbers c, d we have 


(1) Vol,(cv + dw, w) = c Vol (v, w). 


Indeed, if d — 0, this is nothing but what we have shown previously. If 
d #0, then again by what has been shown previously, 


d Vol,(cv + dw, w) = Volg(cv + dw, dw) = c Volg(v, dw) = cd Volq(v, w). 


Canceling d yields relation (1). 
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From this last formula, the linearity now follows. Indeed, if 
U, = cv + diw and U, = C3U + dw, 
then 


Volg(v, + v2, w) = Volo((c, + c;)v + (d, + d,)w, w) 
= (c, + c5) Volo(v, w) 
= c, Volo(t, w) + c Volo(v, w) 
= Vol,(v,, w) + Volo(v;, w). 


This concludes the proof of the fact that 
Vol,(v, w) = D(v, w), 
and hence of Theorem 6.1. 


Remark 1. The proof given above is slightly long, but each step is 
quite simple. Furthermore, when one wishes to generalize the proof to 
higher dimensional space (even 3-space), one can give an entirely similar 
proof. The reason for this is that the conditions characterizing a deter- 
minant involve only two coordinates at a time and thus always take 
place in some 2-dimensional plane. Keeping all but two coordinates 
fixed, the above proof then can be extended at once. Thus for instance 
in 3-space, let us denote by P(u, v, w) the box spanned by vectors u, v, w 
(Fig. 7) namely all combinations 


tuu + tU 4- t4w with 0€ t; <1. 


Let Vol(u, v, w) be the volume of this box. 


u m 
v 
w 


Figure 7 
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Theorem 6.8. The volume of the box spanned by u, v, w is the absolute 
value of the determinant |D(u, v, w)|. That is, 


Vol(u, v, w) = |D(u, v, w)|. 


The proof follows exactly the same pattern as in the two-dimensional 
case. Indeed, the volume of the cube spanned by the unit vectors is 1. If 
two of the vectors u, v, w are equal, then the box is actually a 2-di- 
mensional parallelogram, whose 3-dimensional volume is 0. Finally, the 
proof of linearity is the same, because all the action took place either in 
one or in two variables. The other variables can just be carried on in 
the notation but they did not enter in an essential way in the proof. 

Similarly, one can define n-dimensional volumes, and the correspond- 
ing theorem runs as follows. 


Theorem 6.9. Let v,,...,v, be elements of R". Let Vol(v,,...,v,) be the 
n-dimensional volume of the n-dimensional box spanned by w,,...,t,. 
Then 

Vol(v,,...,v,) = |D(v,,...,v,)|- 


Of course, the n-dimensional box spanned by 1,,...,v, is the set of 
linear combinations 


> ty, with O<t,<1. 


IA 


Remark 2. We have used geometric properties of area to carry out 
the above proof. One can lay foundations for all this purely analytically. 
If the reader is interested, cf. my book Undergraduate Analysis. 


Remark 3. In the special case of dimension 2, one could actually have 
given a simpler proof that the determinant is equal to the area. But we 
chose to give the slightly more complicated proof because it is the one 
which generalizes to the 3-dimensional, or n-dimensional case. 


We interpret Theorem 6.1 in terms of linear maps. Given vectors v, w 
in the plane, we know that there exists a unique linear map. 


L: R? > R? 
such that L(E!) = v and L(E?) = w. In fact, if 
v = aE! + cE’, w = bE! + dE?, 


then the matrix associated with the linear map is 


(ea) 
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Furthermore, if we denote by C the unit square spanned by E!, E?, and by 
P the parallelogram spanned by v, w, then P is the image under L of C, 
that is L(C) = P. Indeed, as we have seen, for 0 € t; € 1 we have 


L(t E! + t, E?) =t,L(E') + t, LUE?) = tqv + tow. 


If we define the determinant of a linear map to be the determinant of its 
associated matrix, we conclude that 


(x) (Area of P) = |Det(L)|. 


To take a numerical example, the area of the parallelogram spanned by 
the vectors (2, 1) and (3, — 1) (Fig. 8) is equal to the absolute value of 


and hence is equal to 5. 


(3, 7 1) 


Figure 8 


Theorem 6.10. Let P be a parallelogram spanned by two vectors. Let 
L: R? > R° be a linear map. Then 


Area of L(P) = |Det L| (Area of P). 


Proof. Suppose that P is spanned by two vectors v, w. Then L(P) 
is spanned by L(v) and L(w) (Cf. Fig. 9) There is a linear map 
L,: R? 2 R? such that 


L,(E') = UU and L,CE?) = W. 
Then P = L,(C), where C is the unit square, and 


L(P) = L(L,(C)) = (Le Ly \(C). 
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(b) 
Figure 9 


By what we proved above in (x), we obtain 
Vol L(P) = |Det(Lo L,)| = |Det(L) Det(L,)| = |Det(L)| Vol(P), 
thus proving our assertion. 


Corollary 6.11. For any rectangle R with sides parallel to the axes, and 
any linear map L: R? > R? we have 


Vol L(R) = |Det(L)| Vol(R). 
Proof. Let c,, c, be the lengths of the sides of R. Let R, be the 


rectangle spanned by c,E' and c; E^. Then R is the translation of R, by 
some vector, say R= R; +u. Then 


L(R) = L(R, + u) = L(R,) + L(u) 
is the translation of L(R,) by L(u). (Cf. Fig. 10.) Since area does not 


change under translation, we need only apply Theorem 6.1 to conclude 
the proof. 


u= (a, C) 
R=R\+u 


(a, c+ c2) (a+c}, c4- c2) 


(a,c) (a+c1, C) 


cu Ei 


Figure 10 
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Exercises VII, S6 


1. If g(v, w) satisfies the first two axioms of a determinant, prove that 
g(v, w) = —g(w, v) 


for all vectors v, w. This fact was used in the uniqueness proof. [Hint: Ex- 
pand g(v + w, v + w) = 0.] 


2. Find the area of the parallelogram spanned by the following vectors. 
(a) (2, 1) and (—4, 5) (b) (3, 4) and (—2, —3) 


3. Find the area of the parallelogram such that three corners of the parallelo- 
gram are given by the following points. 
(a) (1, D, (2, —1), 4, 6) (b) (—3, 2), (1, 4, (-2, -7) 
(0 (2, 5), (-1, 4, (1,2) (d) (1, D, (1, 0), (2, 3) 


4. Find the volume of the parallelepiped spanned by the following vectors in 
3-space. 
(a) (1, 1, 3), (1, 2, — 1), (1, 4, 1) (b) (1, —1, 4), (1, 1, 0), (—1, 2, 5) 
(c) (—1, 2, 1), (2, 0, 1), (1, 3, 0) (d) (—2, 2, 1), (0, 1, 0), (—4, 3, 2) 


CHAPTER VIII 


Eigenvectors and 
Eigenvalues 


This chapter gives the basic elementary properties of eigenvectors and 
eigenvalues. We get an application of determinants in computing the 
characteristic polynomial. In §3, we also get an elegant mixture of calcu- 
lus and linear algebra by relating eigenvectors with the problem of find- 
ing the maximum and minimum of a quadratic function on the sphere. 
Most students taking linear algebra will have had some calculus, but the 
proof using complex numbers instead of the maximum principle can be 
used to get real eigenvalues of a symmetric matrix if the calculus has to 
be avoided. Basic properties of the complex numbers will be recalled in 
an appendix. 


VIII, S1. Eigenvectors and Eigenvalues 


Let V be a vector space and let 
A:VV 


be a linear map of V into itself. An element ve V is called an eigenvector 
of A if there exists a number 4 such that Av = Av. If v #O then À is 
uniquely determined, because A,v = A5v implies 4, = 4,. In this case, we 
say that A is an eigenvalue of A belonging to the eigenvector v. We also 
say that v is an eigenvector with the eigenvalue A. Instead of eigenvector 
and eigenvalue, one also uses the terms characteristic vector and charac- 
teristic value. 

If A is a square n x n matrix then an eigenvector of A is by definition 
an eigenvector of the linear map of R" into itself represented by this 
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matrix. Thus an eigenvector X of A is a (column) vector of R" for which 
there exists AeR such that AX = AX. 


Example 1. Let V be the vector space over R consisting of all infini- 
tely differentiable functions. Let AER. Then the function f such that 
f(t) = e" is an eigenvector of the derivative d/dt because df/dt = Ae". 


Example 2. Let 


dy Ge D 
A=|: ° 
0 an 
be a diagonal matrix. Then every unit vector E' (i= 1,...,n) is an 


eigenvector of A. In fact, we have AE! = a, E*: 


0 a» S ter 0 
E 2 l | =|] 4 
0 0 -- aJ V: 

0 0 


Example 3. If A: V —^V is a linear map, and v is an eigenvector of A, 
then for any non-zero scalar c, cv is also an eigenvector of A, with the 
same eigenvalue. 


Theorem 1.1. Let V be a vector space and let A: V —^ V be a linear map. 

Let AER. Let V, be the subspace of V generated by all eigenvectors of 

A having 4 as eigenvalue. Then every non-zero element of V, is an 

eigenvector of A having À as eigenvalue. 

Proof. Let v,, v; € V be such that Av, = Av, and Av, = Av. Then 
A(v, + U5) = Av, + Av, = Av, + Av, = A(v, + V>). 


If c € K then A(cv,) = cAv, = cAv, = Acv,. This proves our theorem. 


The subspace V, in Theorem 1.1 is called the eigenspace of A belong- 
ing to À. 


Note. If v,, v, are eigenvectors of A with different eigenvalues 4, # 4; 
then of course v, + v, is not an eigenvector of A. In fact, we have the 


following theorem: 


Theorem 1.2. Let V be a vector space and let A: V —^ V be a linear map. 
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Let v,,...,v,, be eigenvectors of A, with eigenvalues À,,...,À 
vely. Assume that these eigenvalues are distinct, i.e. 


respecti- 


m 


A, X Àj if | ij. 
Then v,,...,v, are linearly independent. 


Proof. By induction on m. For m= 1, an element v,€V, v, #0 is 
linearly independent. Assume m > 1. Suppose that we have a relation 


(*) C10, $e + O0, = O 


with scalars cj. We must prove all c; = 0. We multiply our relation (x) 
by A, to obtain 
C,A,v, ah hae CmA1Vm — O. 


We also apply A to our relation (*). By linearity, we obtain 
C,A404, Heee F 6 AU, = O. 
We now subtract these last two expressions, and obtain 
C(ÀA5 — A3)U t t 6, (AS — A430, = O. 


Since Aj— 410 for j-—2,...,m we conclude by induction that 
C;—-:—0,-0. Going back to our original relation, we see that 
CU, = O, whence c, = 0, and our theorem is proved. 


Example 4. Let V be the vector space consisting of all differentiable 
functions of a real variable t. Let o,,...,«, be distinct numbers. The 
functions 


are eigenvectors of the derivative, with distinct eigenvalues a,,...,«,,, and 
hence are linearly independent. 


Remark 1. In Theorem 1.2, suppose V is a vector space of dimension 
n and A: VV is a linear map having n eigenvectors v,,...,v, whose 
eigenvalues 4,,...,A4, are distinct. Then {v,,...,v,} is a basis of V. 


Remark 2. One meets a situation like that of Theorem 1.2 in the 
theory of linear differential equations. Let A =(a,;) bean n x n matrix, 
and let 

fi) 
F(t)-| : 
falt) 
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be a column vector of functions satisfying the equation 


dF 
— = AF(t). 
di (t) 


In terms of the coordinates, this means that 


Y a;fyt). 


j-1 


dí < 
dt | 


Now suppose that A is a diagonal matrix, 


a, 0 0 
A-|: : . with a; ZO alli. 
0 0 wes d, 


Then each function f(t) satisfies the equation 


df, — 
d a; fi{t). 


By calculus, there exist numbers c,,...,c, such that for i= 1,...,n we 
have 


f(t) = ce. 


[Proof: if df/dt = af (t), then the derivative of f(t)/e" is 0, so f(t)/e" is 
constant.] Conversely, if c,,...,c, are numbers, and we let 


cue 


F(t) — 


c, e^"! 


Then F(t) satisfies the differential equation 
oe = AF(t) 
dt 


Let V be the set of solutions F(t) for the differential equation 
dF/dt = AF(t). Then V is immediately verified to be a vector space, and 
the above argument shows that the n elements 
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form a basis for V. Furthermore, these elements are eigenvectors of A, 
and also of the derivative (viewed as a linear map). 

The above is valid if A is a diagonal matrix. If A is not diagonal, 
then we try to find a basis such that we can represent the linear map A 
by a diagonal matrix. We do not go into this type of consideration here. 


Exercises VIII, §1 


Let a be a number £0. 


1. Prove that the eigenvectors of the matrix 


generate a 1-dimensional space, and give a basis for this space. 


2. Prove that the eigenvectors of the matrix 


b y 


generate a 2-dimensional space and give a basis for this space. What are the 
eigenvalues of this matrix? 


3. Let A be a diagonal matrix with diagonal elements a,,,...,a,,. What is the 


pst jas 
dimension of the space generated by the eigenvectors of A? Exhibit a basis 
for this space, and give the eigenvalues. 


4. Show that if 0c R, then the matrix 
Ps 0 sin o) 
A-2Í. 
sin@  —cos 0 


always has an eigenvector in R?, and in fact that there exists a vector v, such 
that Av, — v,. [Hint: Let the first component of v, be 


E sin 0 
Ao dre cos) 


if cos 0 z 1. Then solve for y. What if cos 0 = 17] 


5. [n Exercise 4, let v, be a vector of R? perpendicular to the vector v, found in 
that exercise. Show that Av, = —v,. Define this to mean that A is a reflec- 
tion. 


6. Let 
cosÜ —sin@ 
«o-( 


sin 0 cos 0 
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be the matrix of a rotation. Show that R(0) does not have any real eigen- 
values unless R(0) = + I. [It will be easier to do this exercise after you have 
read the next section.] 


7. Let V be a finite dimensional vector space. Let A, B be linear maps of V into 
itself. Assume that AB — BA. Show that if v is an eigenvector of A, with 
eigenvalue 4, then Bv is an eigenvector of A, with eigenvalue 4 also if Bv Æ O. 


VIII, S2. The Characteristic Polynomial 


We shall now see how we can use determinants to find the eigenvalue of 
a matrix. 


Theorem 2.1. Let V be a finite dimensional vector space, and let 4 be a 
number. Let A: V —^ V be a linear map. Then 4 is an eigenvalue of A if 
and only if A — AI is not invertible. 


Proof. Assume that A is an eigenvalue of A. Then there exists an 
element veV, v zzO such that Av = 4v. Hence Av — 4v = O, and 
(A — AI = O. Hence A — AI has a non-zero kernel, and A — AI cannot 
be invertible. Conversely, assume that A4 — AI is not invertible. By 
Theorem 2.4 of Chapter 5, we see that A — AI must have a non-zero 
kernel, meaning that there exists an element veV, v #O such that 
(A — AI) = 0. Hence Av — Av = O, and Av = Av. Thus 4 is an eigen- 
value of A. This proves our theorem. 


Let A be an nx n matrix, A =(a;;). We define the characteristic 
polynomial P, of A to be the determinant 


P(t) = Det(tI — A), 
or written out in full, 


t — âi 
t— di —0415 > —0,, 


P()-| : | | j= °°" 
— 0, ~an s t—a 


We can also view A as a linear map from R” to R”, and we also say 
that P (t) is the characteristic polynomial of this linear map. 


Example 1. The characteristic polynomial of the matrix 
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which we expand according to the first column, to find 
P (t) ^ t — t? — 4t 4+ 6. 


For an arbitrary matrix A = (aj;), the characteristic polynomial can be 
found by expanding according to the first column, and will always con- 
sist of a sum 


(=a (£— a) + n 


Each term other than the one we have written down will have degree 
« n. Hence the characteristic polynomial is of type 


P (t) = t" + terms of lower degree. 


Theorem 2.2. Let A be an n x n matrix. A number À is an eigenvalue 
of A if and only if 4 is a root of the characteristic polynomial of A. 


Proof. Assume that 4 is an eigenvalue of A. Then AI — A is not 
invertible by Theorem 2.1, and hence Det(AJ — A) = 0, by Theorem 3.1 
of Chapter VII and Theorem 2.5 of Chapter V. Consequently 4 is a root 
of the characteristic polynomial. Conversely, if A is a root of the charac- 
teristic polynomial, then 


Det(AI — A) = 0, 


and hence by the same Theorem 3.1 of Chapter VII we conclude that 
AI — Ais not invertible. Hence 4 is an eigenvalue of A by Theorem 2.1. 


Theorem 2.2 gives us an explicit way of determining the eigenvalues of 
a matrix, provided that we can determine explicitly the roots of its char- 
acteristic polynomial. This is sometimes easy, especially in exercises at 
the end of chapters when the matrices are adjusted in such a way that 
one can determine the roots by inspection, or simple devices. It is con- 
siderably harder in other cases. 

For instance, to determine the roots of the polynomial in Example 1, 
one would have to develop the theory of cubic polynomials. This can be 
done, but it involves formulas which are somewhat harder than the for- 
mula needed to solve a quadratic equation. One can also find methods 
to determine roots approximately. In any case, the determination of such 
methods belongs to another range of ideas than that studied in the 
present chapter. 
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Example 2. Find the eigenvalues and a basis for the eigenspaces of the 


matrix 
1 4 
2. 3 


The characteristic polynomial is the determinant 


t— 1] —4 
—2 t—3 


|-«-D«-3-8-P-&-5-- Sce) 


Hence the eigenvalues are 5, — 1. 


l ; : x 

For any eigenvalue A, a corresponding eigenvector is a vector | 

such that y 
x + 4y = Ax, 


2x + 3y = Ay, 


or equivalently 
(1 — 2x + 4y = 0, 


2x + (3 — A)y = 0. 


We give x some value, say x = 1, and solve for y from either equation, 
for instance the second to get y = —2/(3 — A). This gives us the eigen- 


vector 
X(A) = 
LAE Lodo : » 


Substituting 4 = 5 and 4 = —1 gives us the two eigenvectors 


1 1 
x -( for A = 5, and x = ( | for 4 = — 1. 
—3 


The eigenspace for 5 has basis X! and the eigenspace for —1 has basis 
X?. Note that any non-zero scalar multiples of these vectors would also 
be bases. For instance, instead of X? we could take 


c) 


Example 3. Find the eigenvalues and a basis for the eigenspaces of the 
matrix 
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The characteristic polynomial is the determinant 


t—2 —1 0 
0 til 1| = (t — 2) (t — 3). 
0 —2 t—4 


Hence the eigenvalues are 2 and 3. 
For the eigenvectors, we must solve the equations 


(2—A)x + y =0, 
(q —A)jy-—z=0, 
2y + (4 — Az = 0. 


Note the coefficient (2 — 4) of x. 

Suppose we want to find the eigenspace with eigenvalue A4 = 2. Then 
the first equation becomes y = 0, whence z = 0 from the second equa- 
tion. We can give x any value, say x = 1. Then the vector 


l 
X! 2[0 
0 


is a basis for the eigenspace with eigenvalue 2. 

Now suppose À Z2, so A= 3. If we put x= 1 then we can solve for 
y from the first equation to give y = 1, and then we can solve for z in 
the second equation, to get z = —2. Hence 


l 
Ac EE 1 
—2 


is a basis for the eigenvectors with eigenvalue 3. Any non-zero scalar 
multiple of X? would also be a basis. 


Example 4. The characteristic polynomial of the matrix 


1 1 2 
0 5 —-—1 
0 0 7 


is (t — 1)(t — St — 7). Can you generalize this? 


Example 5. Find the eigenvalues and a basis for the eigenspaces of the 
matrix in Example 4. 
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The eigenvalues are 1, 5, and 7. Let X be a non-zero eigenvector, say 
x 
X=ly also written =O 3.2) 
Z 


Then by definition of an eigenvector, there is a number 4 such that 
AX = AX, which means 


x+ yt2z=Ax, 
5y — z= Ày, 
dz m. 


Casel. z 2 0, y 20. Since we want a non-zero eigenvector we must 
then have x #0, in which case 4 = 1 by the first equation. Let X! = E! 
be the first unit vector, or any non-zero scalar multiple to get an eigen- 
vector with eigenvalue 1. 


Case2. z 20, y #0. By the second equation, we must have À = 5. 
Give y a specific value, say y — 1. Then solve the first equation for x, 
namely 


pln 


x+1= 5x, which gives x= 
Let 


X? = 


O — Bl 


Then X? is an eigenvector with eigenvalue 5. 
Case 3. z #0. Then from the third equation, we must have 4 = 7. 
Fix some non-zero value of z, say z= 1. Then we are reduced to solv- 


ing the two simultaneous equations 


x+y+2=7x, 
5y — 1 = 7y. 


This yields y = —4 and x = 4. Let 


Then X? is an eigenvector with eigenvalue 7. 
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Scalar multiples of X!, X?, X? will yield eigenvectors with the same 
eigenvalues as X!, X?, X? respectively. Since these three vectors have 
distinct eigenvalues, they are linearly independent, and so form a basis of 
R?. By Exercise 14, there are no other eigenvectors. 


Finally we point out that the linear algebra of matrices could have 
been carried out with complex coefficients. The same goes for determin- 
ants. All that is needed about numbers is that one can add, multiply, 
and divide by non-zero numbers, and these operations are valid with 
complex numbers. Then a matrix A = (aj) of complex numbers has 
eigenvalues and eigenvectors whose components are complex numbers. 
This is useful because of the following fundamental fact: 


Every non-constant polynomial with complex coefficients has a complex 
root. 


If A is a complex n x n matrix, then the characteristic polynomial of A 
has complex coefficients, and has degree n 2 1, so has a complex root 
which is an eigenvalue. Thus over the complex numbers, a square matrix 
always has an eigenvalue, and a non-zero eigenvector. This is not always 
true over the real numbers. (Example?) In the next section, we shall see 
an important case when a real matrix always has a real eigenvalue. 

We now give examples of computations using complex numbers for 
the eigenvalues and eigenvectors, even though the matrix itself has real 
components. It should be remembered that in the case of complex eigen- 
values, the vector space is over the complex numbers, so it consists of 
linear combinations of the given basis elements with complex coefficients. 


Example 6. Find the eigenvalues and a basis for the eigenspaces of the 


matrix 
2 —1 
A= 
G 1 


The characteristic polynomial is the determinant 


t— 2 1 
Ex qe 


|-«-3«-De3-P-35 


Hence the eigenvalues are 


3 T9 — 20 


2 


Thus there are two distinct eigenvalues (but no real eigenvalue): 


pa aan ME S ee 
Lo 27 : 


c, NN and ; 
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Let X = (^ | with not both x, y equal to 0. Then X is an eigenvector if 
and only i AX = AX, that is: 

2x —-y=Ax, 

3x + y = Ày, 


where A is an eigenvalue. This system is equivalent with 


(2— 4x —y — 0, 
3x + (1 — A)y = 0. 


We give x, say, an arbitrary value, for instance x = 1 and solve for y, so 
y = (2 — 4) from the first equation. Then we obtain the eigenvectors 


1 1 
X(A,) = À id and X(4A5) = S ) 
1 2 


Remark. We solved for y from one of the equations. This is 
consistent with the other because A is an eigenvalue. Indeed, if you 
substitute x = 1 and y = 2— 4 on the left in the second equation, you 
get 

3+(1—A\2—A)=0 


because 4 is a root of the characteristic polynomial. 


Then X(A,) is a basis for the one-dimensional eigenspace of 4,, and 
X(A,) is a basis for the one-dimensional eigenspace of 4,3. 


Example 7. Find the eigenvalues and a basis for the eigenspaces of the 


matrix 
1 1 —1 
A=10 1 QO]. 
1 0 1 


We compute the characteristic polynomial, which is the determinant 


easily computed to be 


P(t) = (t — 1)(t? — 2t + 2). 
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Now we meet the problem of finding the roots of P(t) as real numbers 
or complex numbers. By the quadratic formula, the roots of t? — 2t + 2 
are given by 
2+./4-8 
ES elus Eh 


) E 


The whole theory of linear algebra could have been done over the com- 
plex numbers, and the eigenvalues of the given matrix can also be 
defined over the complex numbers. Then from the computation of 
the roots above, we see that the only real eigenvalue is 1; and that 
there are two complex eigenvalues, namely 


1+. /-1 and ] —4/ — 1. 


We let these eigenvalues be 


beil. Bates ah, dieses. 


Let 


be a non-zero vector. Then X is an eigenvector for A if and only if the 
following equations are satisfied with some eigenvalue 4: 


x+y-—Zz=Ax, 
y =Ay, 
x dqoz 42. 


This system is equivalent with 


(1 —A)x+ y—z=0, 
(1 —A)y=0, 
x+(1—A)z=0. 
Case 1. 4 = 1. Then the second equation will hold for any value of y. 


Let us put y=1. From the first equation we get z= 1, and from the 
third equation we get x = 0. Hence we get a first eigenvector 


0 
Xp 
1 
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Case2. 441. Then from the second equation we must have y = 0. 
Now we solve the system arising from the first and third equations: 


(1 — A)x —z=0, 
x+(1—A)z=0. 


If these equations were independent, then the only solutions would be 
x =z=0. This cannot be the case, since there must be a non-zero 
eigenvector with the given eigenvalue. Actually you can check directly 
that the second equation is equal to (A4 — 1) times the first. In any 
case, we give one of the variables an arbitrary value, and solve for the 
other. For instance, let z = 1. Then x= 1/(1 — 4). Thus we get the 
eigenvector 


1/(1— 4) 
x(a) = 0 
1 


We can substitute 4 = 4, and åA = å, to get the eigenvectors with the 
eigenvalues 4, and A, respectively. 
In this way we have found three eigenvectors with distinct eigenvalues, 
namely 
Xt X (44), X(A,). 


Example 8. Find the eigenvalues and a basis for the eigenspaces of the 


matrix 
1 —1 2 
—2 1 3 |. 
1 —1 1 


The characteristic polynomial is 


t— 1 1 —2 
2 t—-1 -3|-s(t—-1y—(r—1)— f. 
—1 1 t—1 


The eigenvalues are the roots of this cubic equation. In general it is not 
easy to find such roots, and this is the case in the present instance. Let 
u =t — 1. In terms of u the polynomial can be written 


Q(u) =u? —u- 1. 


From arithmetic, the only rational roots must be integers, and must 
divide 1, so the only possible rational roots are + 1, which are not roots. 
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Hence there is no rational eigenvalue. But a cubic equation has the 
general shape as shown on the figure: 


Figure 1 


This means that there is at least one real root. If you know calculus, 
then you have tools to be able to determine the relative maximum and 
relative minimum, you will find that the function u? — u — 1 has its rela- 


tive maximum at u = UNE. and that Q( —1/,/3) is negative. Hence 
there is only one real root. The other two roots are complex. This is as 
far as we are able to go with the means at hand. In any case, we give 
these roots a name, and let the eigenvalues be 


Als A>, Àz. 
They are all distinct. 


We can, however, find the eigenvectors in terms of the eigenvalues. 
Let 


P 
X -ly 
74 


be a non-zero vector. Then X is an eigenvector if and only if AX — AX, 
that is: 
x — y + 2z = dx, 


— 2x + y + 3z = Ay, 
x— y+ z=Az. 
This system of equations is equivalent with 
(1 —A)x — y + 2220, 
— 2x + (1 — Ay + 3z = 0, 
x—yct-(1—24) 20. 
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We give z an arbitrary value, say z = 1 and solve for x and y using the 
first two equations. Thus we must solve: 


(4—1x-y-22, 
2x + (A — 1)y = 3. 


Multiply the first equation by 2, the second by (4 — 1) and subtract. 
Then we can solve for y to get 


WA) = m 
From the first equation we find 
F 
x(A) = oe 
Hence eigenvectors are 
x(A,) x(A2) x(A3) 
X(4))-2|1»x40 XA) = PJJ Xs) = 1242]. 
1 1 1 


where 4,, A5, 43 are the three eigenvalues. This is an explicit answer to 
the extent that you are able to determine these eigenvalues. By machine 
or a computer, you can use means to get approximations to 4,, A5, A; 
which will give you corresponding approximations to the three eigenvec- 
tors. Observe that we have found here the complex eigenvectors. Let 4, 
be the real eigenvalue (we have seen that there is only one). Then from 
the formulas for the coordinates of X(A), we see that y(A) or x(A) will be 
real if and only if 4 is real. Hence there is only one real eigenvector 
namely X(4,). The other two eigenvectors are complex. Each eigenvec- 
tor is a basis for the corresponding eigenspace. 


Theorem 2.3. Let A, B be two n x n matrices, and assume that B is 
invertible. Then the characteristic polynomial of A is equal to the 
characteristic polynomial of B^! AB. 


Proof. By definition, and properties of the determinant, 


Det(t1 — A) = Det(B- (t1 — A)B) = Det(tB^! B — B^! AB) 
= Det(t] — B^! AB). 


This proves what we wanted. 
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Exercises VIII, 82 


1. Let A be a diagonal matrix, 


a, 0 0 
PS 0 ay 0 
0 0 a 


(a) What is the characteristic polynomial of A? 
(b) What are its eigenvalues? 


2. Let A be a triangular matrix, 


a,, 0 0 

a213 422 0 
A= ; 

ant 52 Ann 


What is the characteristic polynomial of A, and what are its eigenvalues? 


249 


Find the characteristic polynomial, eigenvalues, and bases for the eigenspaces 


of the following matrices. 


1 2 

3. (a) [ J 
—2 -7 
eC 3) 


4. 4 0 1 p 5 
aj|-2 1 0 [3 -5 3 


E. «Ud —6 4 
3 1 1 |i 2 2 
(){2 4 2 (| 1 2 -1 
1 1 3 = 1 4 


5. Find the eigenvalues and eigenvectors of the following matrices. Show that 


the eigenvectors form a 1-dimensional space. 


2 =l 1 1 2 0 2 
(a) f d (b) (o j Q ( à (d) ( 


6. Find the eigenvalues and eigenvectors of the following matrices. Show that 


the eigenvectors form a 1-dimensional space. 


1 1 l 1 1 0 
(a) | O 1 1 (b) | 0 1 1 
0 0 1 0 0 1 
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7. Find the eigenvalues and a basis for the eigenspaces of the following ma- 


trices. 
1 
a 4 a =~ we 4 
bí-1 3 0 
Gio o po 4 (5) 
zd. dx ed 
i 0 0 0 


8. Find the eigenvalues and a basis for the eigenspaces for the following 
matrices. 


2- 4 p ? 3 2 
@(5 i e 3) e(5 3) 

ÉL 2: 2 3. 2 4 ME E 
(d) | 2 2 | (e) f 1 | (f) [- 4 0 

—3 —6 —6 0 1 -1 —3 P 3 


9. Let V be an n-dimensional vector space and assume that the characteristic 
polynomial of a linear map A: V — V has n distinct roots. Show that V has a 
basis consisting of eigenvectors of A. 


10. Let A be a square matrix. Shows that the eigenvalues of ʻA are the same as 
those of A. 


11. Let A be an invertible matrix. If A is an eigenvalue of A show that 4 #0 
and that 4^! is an eigenvalue of A^ !. 


12. Let V be the space generated over R by the two functions sint and cost. 
Does the derivative (viewed as a linear map of V into itself) have any non- 
zero eigenvectors in V? If so, which? 


13. Let D denote the derivative which we view as a linear map on the space of 
differentiable functions. Let k be an integer #0. Show that the functions 
sin kx and cos kx are eigenvectors for D?. What are the eigenvalues? 


14. Let A: V> V be a linear map of V into itself, and let {v,,...,v,} be a basis of 
V consisting of eigenvectors having distinct eigenvalues c,,...,c,. Show that 
any eigenvector v of A in V is a scalar multiple of some v;. 


15. Let A, B be square matrices of the same size. Show that the eigenvalues of 
AB are the same as the eigenvalues of BA. 


VIII, S3. Eigenvalues and Eigenvectors of Symmetric 
Matrices 


We shall give two proofs of the following theorem. 


Theorem 3.1. Let A be a symmetric n x n real matrix. Then there 
exists a non-zero real eigenvector for A. 
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One of the proofs will use the complex numbers, and the other proof 
will use calculus. Let us start with the calculus proof. 
Define the function 


f(X)-'XAX for X ER". 


Such a function f is called the quadratic form associated with A. If 
X = (Xis... Xp) is written in terms of coordinates, and A = (a,,;) then 


f(X)- y dj; X;X;. 


i,j=1 


Example. Let 


Let 'X =(x, y. Then 


3 —1 
'XAX = (x, » JC) = 3x? — 2xy + 2y*. 


More generally, let 


Then 


b 
(x, y) i *\ = ax? 4 2bxy + dy’. 
b dy 


Example. Suppose we are given a quadratic expression 
f(x, y) = 3x? + 5xy — 4y*. 


Then it is the quadratic form associated with the symmetric matrix 


In many applications, one wants to find a maximum for such a func- 
tion f on the unit sphere. Recall that the unit sphere is the set of all 
points X such that || X|| = 1, where | X| = ./X-X. It is shown in analy- 
sis courses that a continuous function f as above necessarily has a maxi- 
mum on the sphere. A maximum on the unit sphere is a point P such 
that | P|| = 1 and 


f (P) 2 f(X) for all X with |X|| = 1. 
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The next theorem relates this problem with the problem of finding eigen- 
vectors. 


Theorem 3.2. Let A be a real symmetric matrix, and let f (X) ='X AX 
be the associated quadratic form. Let P be a point on the unit sphere 
such that f(P) is a maximum for f on the sphere. Then P is an 


eigenvector for A. In other words, there exists a number À such that 
AP = AP. 


Proof. Let W be the subspace of R” orthogonal to P, that is W = P+. 
Then dim W = n — 1. For any element we W, ||w|| = 1, define the curve 


C(t) = (cos t)P + (sin t)w. 


The directions of unit vectors we W are the directions tangent to the 
sphere at the point P, as shown on the figure. 


P = C(0) 


———— 
A em mA la eee 
- ~a 


Figure 2 


The curve C(t) lies on the sphere because ||C(t)|| = 1, as you can verify 
at once by taking the dot product C(t)-C(t), and using the hypothesis 
that P-w=0. Furthermore, C(0) = P, so C(t) is a curve on the sphere 
passing through P. We also have the derivative 


C'(t) = (—sin t)P + (cos t)w, 


and so C'(0) 2 w. Thus the direction of the curve is in the direction of 
w, and is perpendicular to the sphere at P because w- P — 0. Consider 
the function 


g(t) = F(C) = C(t) ACH). 


Using coordinates, and the rule for the derivative of a product which 
applies in this case (as you might know from calculus), you find the 
derivative: 

g(t) = C(t): AC(t) + C(t)- AC'(t) 


= 2C'(t)- AC(t), 
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because A is symmetric. Since f(P) is a maximum and g(0) = f(P), it 
follows that g'(0) 2 0. Then we obtain: 


O = g'(0) = 2C'(0)- AC(0) = 2w- AP. 


Hence AP is perpendicular to W for all weW. But W+ is the 
1-dimensional space generated by P. Hence there is a number 4 such 
that AP — AP, thus proving the theorem. 


Corollary 3.3. The maximum value of f on the unit sphere is equal to 
the largest eigenvalue of A. 


Proof. Let A be any eigenvalue and let P be an eigenvector on the 
unit sphere, so ||P|| = 1. Then 


{(P) =] "PAP ="PAP SAPP = 4. 


Thus the value of f at an eigenvector on the unit sphere is equal to the 
eigenvalue. Theorem 3.2 tells us that the maximum of f on the unit 
sphere occurs at an eigenvector. Hence the maximum of f on the unit 
sphere is equal to the largest eigenvalue, as asserted. 


Example. Let f(x, y) = 2x* —3xy+y?. Let A be the symmetric 
matrix associated with f. Find the eigenvectors of A on the unit circle, 
and find the maximum of f on the unit circle. 

First we note that f is the quadratic form associated with the matrix 


2 3 
A= ( : B 
—$ 1 


By Theorem 3.2 a maximum must occur at an eigenvector, so we first 
find the eigenvalues and eigenvectors. 
The characteristic polynomial is the determinant 
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Putting x — 1 this gives the possible eigenvectors 


1 
XQ - (1o. 5) 


Thus there are two such eigenvectors, up to non-zero scalar multiples. 
The eigenvectors lying on the unit circle are therefore 


X(4) ! 3 + /10 3 — /10 
PA) = ——— with | 4-2 —3— ad Sh = —JMY-—. 
ie IXI 2 2 


By Corollary 3.3 the maximum is the point with the bigger eigenvalue, 
and must therefore be the point 


3+ ./10 
P(A) with pe a 


The maximum value of f on the unit circle is (3 + J/10)/2. 
By the same token, the minimum value of f on the unit circle is 


(3 — ./10)/2. 


We shall now use the complex numbers C for the second proof. A 
fundamental property of complex numbers is that every non-constant 
polynomial with complex coefficients has a root (a zero) in the complex 
numbers. Therefore the characteristic polynomial of A has a complex 
root A, which is a priori a complex eigenvalue, with a complex eigen- 
vector. 


Theorem 3.4. Let A be a real symmetric matrix and let 4 be an eigen- 
value in C. Then À is real. If Z X O is a complex eigenvector with 
eigenvalue 24, and Z = X + iY where X, YER", then both X, Y are real 
eigenvectors of A with eigenvalue À, and X or Y + O. 
Proof. Let Z = '(z,,...,z,) with complex coordinates z;. Then 
Zu eZ euZulczepqxz2-dl^de z^ 9. 
By hypothesis, we have AZ — AZ. Then 
'"LAZ ='ZAZ = MZZ. 
The transpose of a 1 x 1 matrix is equal to itself, so we also get 


'"Z'AZ = ZAZ = AZZ.: 
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But AZ = AZ = AZ and AZ = AZ = AZ. Therefore 


Since ‘ZZ £x 0 it follows that A4 = 1, so 4 is real. 


Now from AZ = AZ we get 


AX +iAY — AX + iAY, 


255 


and since A, X, Y, are real it follows that AX = AX and AY = AY. This 


proves the theorem. 


Exercises VIII, 83 


1. Find the eigenvalues of the following matrices, and the maximum value of the 


associated quadratic forms on the unit circle. 


2 -1 | 3 
(a) * 3 (b) ( H 


2. Same question, except find the maximum on the unit sphere. 


1-1 0 2- b. 20 
Gode 2533 (by [er 3021 
0 -1 1 0 —1 2 


3. Find the maximum and minimum of the function 
f(x, y) = 3x? + 5xy — 4y? 


on the unit circle. 


VIII, §4. Diagonalization of a Symmetric Linear Map 


In this section we give an application of the existence of eigenvectors as 
proved in §3. Since we shall do an induction, instead of working with R” 
we have to start with a formulation dealing with a vector space in which 


coordinates have not yet been chosen. 


So let V be a vector space of dimension n over R, with a positive 


definite scalar product <v, wò for v, we V. Let 


A: V >V 
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be a linear map. We shall say that A is symmetric (with respect to the 
scalar product) if we have the relation 


(Av, wò = (v, Aw» 
for all v, we V. 


Example. Suppose V — R" and that the scalar product is the usual 
dot product between vectors. A linear map A is represented by a matrix, 
and we use the same letter 4 to denote this matrix. Then for all v, 
weR" we have 


(Av, wò —'wAv 


if we view v, w as column vectors. Since 'wAv is a 1 x 1 matrix, it is 
equal to its transpose. Thus we have the formula 


'wAv = tv Aw, 
or in terms of the < , » notation, 


(Av, wò = (v, 'Aw). 


The condition that A is symmetric as a linear map with respect to the 
scalar product is by definition (Av, wẹ = <v, Aw», or in terms of matrix 
multiplication 


'wAv — 'vAw. 


Comparing with the previous formula, this means that A =‘A. Thus we 
find: 


Let A be an n x n matrix, and L, the associated linear map on R". 
Then L, is symmetric with respect to the scalar product if and only if 
A is a symmetric matrix. 


If V is a general vector space of dimension n with a positive definite 


scalar product, and A: V > V is a linear map of V into itself, then there 
is a unique linear 


tA: VOV 


satisfying the formula 


(Av, wò = (v, 'Aw). 
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We have just seen this when V = R”. In general, we pick an orthonor- 
mal basis of V which allows us to identify V with R", and to identify the 
scalar product with the ordinary dot product. Then 'A (as a linear map) 
coincides with the transpose of the matrix representing A. 

We can now say that a linear map A: V — V is symmetric if and only 
if it is equal to its own transpose. When V is identified with R" by using 
an orthonormal basis, this means that the matrix representing A is equal 
to its transpose, in other words, the matrix is symmetric. 

We can reformulate Theorem 3.1 as follows: 


Let V be a finite dimensional vector space with a positive definite scalar 
product. Let A:V —^V be a symmetric linear map. Then A has a non- 
Zero eigenvector. 


Let W be a subspace of V, and let A: V —^ V be a symmetric linear map. 
We say that W is stable under A if A(W) c W, that is for all ue W we 
have Aue W. 


We note that if W is stable under A then its orthogonal complement W+ 
is also stable under A. 


Proof. Let we W+. Then for all ue W we have 
(Aw, u» = <w, Au» = 0 


because Aue W and we W+. Hence Awe W+, thus proving the assertion. 


Theorem 4.1. Let V be a finite dimensional vector space over the real 
numbers, of dimension n > 0, and with a positive definite scalar product. 
Let 

A: V7>V 


be a linear map, symmetric with respect to the scalar product. Then V 
has an orthonormal basis consisting of eigenvectors. 


Proof. By Theorem 3.1, there exists a non-zero eigenvector P for A. 
Let W be the one-dimensional space generated by P. Then W is stable 
under A. By the above remark, W+ is also stable under A and is a 
vector space of dimension n — 1. We may then view A as giving a sym- 
metric linear map of W+ into itself. We can then repeat the procedure 
We put P = P,, and by induction we can find a basis {P,,...,P,} of W+ 
consisting of eigenvectors. Then 


LEE husky 
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is an orthogonal basis of V consisting of eigenvectors. We divide each 
vector by its norm to get an orthonormal basis, as desired. 


If {e,,...,e,$ is an orthonormal basis of V such that each e; is an 
eigenvector, then the matrix of A with respect to this basis is diagonal, 
and the diagonal elements are precisely the eigenvalues: 


7m $e O 
0 Az 0 
O Xy uw J 


In such a simple representation, the effect of A then becomes much 
clearer than when A is represented by a more complicated matrix with 
respect to another basis. 


Example. We give an application to linear differential equations. Let 
A bean n x n symmetric real matrix. We want to find the solutions in 
R" of the differential equation 


dX(t) _ 
where 
x(t) 
X(t)=] : 
Xq(t) 


is given in terms of coordinates which are functions of t, and 
dx,/dt 
dx,/dt 


Writing this equation in terms of arbitrary coordinates is messy. So let 
us forget at first about coordinates, and view R" as an n-dimensional 
vector space with a positive definite scalar product. We choose an 
orthonormal basis of V (usually different from the original basis) consist- 
ing of eigenvectors of A. Now with respect to this new basis, we can 
identify V with R” with new coordinates which we denote by y,,...,y 


n° 
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With respect to these new coordinates, the matrix of the linear map L, 
is 


ie se d 
D o xe di 
0 dO ue 4 


n 


where 4,,...,4, are the eigenvalues. But in terms of these more conven- 
ient coordinates, our differential equation simply reads 


dy dy, 
PEL ee, — Ay i 


Thus the most general solution is of the form 
y(t) = ce?" with some constant c;. 


The moral of this example is that one should not select a basis too 
quickly, and one should use as often as possible a notation without 
coordinates, until a choice of coordinates becomes imperative to make 
the solution of a problem simpler. 


Exercises VIII, $4 


1. Suppose that A is a diagonal n x n matrix. For any X eR”, what is 'X AX in 
terms of the coordinates of X and the diagonal elements of A? 


2. Let 


a, 0 0 
4° 4 0 
0 0 i: 


be a diagonal matrix with 4, 2 0,...,4,2z 0. Show that there exists an n x n 
diagonal matrix B such that B? — A. 


3. Let V be a finite dimensional vector space with a positive definite scalar pro- 
duct. Let A: VV be a symmetric linear map. We say that A is positive 
definite if (Av, v» > O for all ve V and v #0. Prove: 

(a) If A is positive definite, then all eigenvalues are > 0. 

(b) If A is positive definite, then there exists a symmetric linear map B such 
that B? = A and BA = AB. What are the eigenvalues of B? [Hint: Use a 
basis of V consisting of eigenvectors. ] 
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4. We say that A is positive semidefinite if (Av, v» Z O for all ve V. Prove the 
analogues of (a), (b) of Exercise 3 when A is only assumed semidefinite. Thus 
the eigenvalues are 2 0, and there exists a symmetric linear map B such that 
B? — A. 


5. Assume that A is symmetric positive definite. Show that A? and A ! are 
symmetric positive definite. 


6. Let U: R” — R” be a linear map, and let < , > denote the usual scalar (dot) 
product. Show that the following conditions are equivalent: 
(i) |Uv|| = |vl| for all ve R*. 
(ii) QUv, Uw) = (v, wò for all v, we R”. 
(iil) U is invertible, and ‘U = U ' !. 
[Hint: For (ii), use the identity 


((v + w), (v + w)> — «(o — w), (v — w)> = 4<v, w), 


and similarly with a U in front of each vector.] When U satisfies any one 
(and hence all) of these conditions, then U is called unitary. The first condi- 
tion says that U preserves the norm, and the second says that U preserves the 
scalar product. 


7. Let A: R 2 R" be an invertible linear map. 
(i) Show that ‘AA is symmetric positive definite. 
(ii) By Exercise 3b, there is a symmetric positive definite B such that B? = 
‘AA. Let U = AB~'. Show that U is unitary. 
(iui) Show that A = UB. 


8. Let B be symmetric positive definite and also unitary. Show that B= I. 


Appendix. Complex Numbers 


The complex numbers C are a set of objects which can be added and 
multiplied, the sum and product of two complex numbers being also 
complex numbers, and satisfying the following conditions: 


(1) Every real number is a complex number, and if «, f are real 
numbers, then their sum and product as complex numbers are 
the same as their sum and product as real numbers. 

(2) There is a complex number denoted by i such that i? = —1. 

(3) Every complex number can be written uniquely in the form 
a + bi, where a, b are real numbers. 

(4) The ordinary laws of arithmetic concerning addition and multipli- 
cation are satisfied. We list these laws: 


If «, B, y are complex numbers, then 


(a+ Pp)+y=a+(B+y) and  (ap)y-o(py). 
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We have «(p + y) 2 «p + ay and (f + y)« = Ba + ya. 
We have af = fa and a+ p — p 4 a. 

If 1 is the real number one, then la = a. 

If O is the real number zero, then O« = 0. 

We have a+ (— 1)« = 0. 


We shall now draw consequences of these properties. If we write 


«x =a +a i and B= b, +b,i, 
then 
at+fp=a,+a,i+b,+b,i=a,+), + (a, + b)i. 
If we call a, the real part, or real component of a, and a, its imaginary 
part, or imaginary component, then we see that addition is carried out 
componentwise. The real part and imaginary part of « are denoted by 


Re(a) and Im(«) respectively. 
We have 


ap = (a, + a3i)(b, + bi) = a,b, — azb, + (a,b, + a,b. 
Let x =a + bi be a complex number with a, b real. We define 
& = a — bi 
and call x the complex conjugate, or simply conjugate, of «. Then 
aX = a? + b*. 
If « — a4 bi is #0, and if we let 


| 8 
Ems 


then «4 = Ax = 1, as we see immediately. The number 4 above is called 
the inverse of « and is denoted by a‘, or 1/a. We note that it is the 
only complex number z such that za — 1, because if this equation is 
satisfied, we multiply it by 4 on the right to find z = A4. If o, fj are 
complex numbers, we often write fj/x instead of a 'B or fla !. We see 
that we can divide by complex numbers # 0. 

We have the rules 


ap—ap  ac-f-&&-f, &-a. 


These follow at once from the definitions of addition and multiplication. 
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We define the absolute value of a complex number « = a + bi to be 


lal = ./a? + b?. 


If we think of « as a point in the plane (a, b), then |«| is the length of 
the line segment from the origin to «. In terms of the absolute value, we 
can write 


a 

p! 
x -— LL 
|a]? 


provided « #0. Indeed, we observe that |x|? = «x. Note also that 
|| = [a]. 

The absolute value satisfies properties analogous to those satisfied by 
the absolute value of real numbers: 


|x| z 0 and = O if and only if « = 0. 
|| = lal [B] 
la + B| S |o] + Ip]. 
The first assertion is obvious. As to the second, we have 
|xp|? = afa = |o|*|BI?. 


Taking the square root, we conclude that |a| |B| 2 |aB|. Next, we have 
Ja + BI = (a + ya + B) = (a + BY + D) 
because af = Ba. However, we have 
2 Re(fa) < 2| pal 


because the real part of a complex number is < its absolute value. 
Hence 
Ja + Bl? Sal? + 21fa| + 181 


< |a|? + 2|B| [a] +1817 
= (I| + |BI^. 


Taking the square root yields the final property. 


Let z 2 x - iy be a complex number #0. Then 2z/|z| has absolute 
value 1. 


The main advantage of working with complex numbers rather than 
real numbers is that every non-constant polynomial with complex coeffi- 
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cients has a root in C. This is proved in more advanced courses in 
analysis. For instance, a quadratic equation 


ax? + bx - c — 0, 


with a z 0 has the roots 
—b + J/ b? — 4ac 
x= 
2a 


If b? — 4ac is positive, then the roots are real If b? — 4ac is negative, 
then the roots are complex. The proof for the quadratic formula uses 
only the basic arithmetic of addition, multiplication, and division. 
Namely, we complete the square to see that 


2a 4a 2a 


p2 2 b? — 4ac 
x dee mee 
2a 4a? ' 


take the square root, and finally get the desired value for x. 


" ( 3l b? | 31 b? — 4ac 
ax” + bx +c=alx4+ =] —-—+c=ax+— EE CS 


Then we solve 


Application to vector spaces 


To define the notion of a vector space, we need first the notion of sca- 
lars. And the only facts we need about scalars are those connected with 
addition, multiplication, and division by non-zero elements. These basic 
operations of arithmetic are all satisfied by the complex numbers. There- 
fore we can do the basic theory of vector spaces over the complex 
numbers. We have the same theorems about linear combinations, 
matrices, row rank, column rank, dimension, determinants, characteristic 
polynomials, eigenvalues. 

The only basic difference (and it is slight) comes when we deal with 
the dot product. If Z = (z,,...,z, and W = (w,,...,w,) are n-tuples in 
C", then their dot product is as before 


Z-W = zw, + ZW: 


But observe that even if Z # O, then Z-Z may be 0. For instance, let 
Z = (1,i) in C*. Then 


Z-Z=1+?=1-1=0. 


Hence the dot product is not positive definite. 
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To remedy this, one defines a product which is called hermitian and is 
almost equal to the dot product, but contains a complex conjugate. That 


is we define W = (w,,...,w,) and 
(Z, W»-Z-W-ezw + +Z, Wn 
so we put a complex conjugate on the coordinates of W. Then 
(Z, ZY = Z-Z = 2,1 te + Zaa = |2? + + Zal. 


Hence once again, if Z # O, then some coordinate z; #0, so the sum on 
the right is #0 and (Z,Z» > 0. 
If « is a complex number, then from the definition we see that 


(aZ, Wy — ac(Z, Wy but (Z, aW» = aZ, W». 


Thus a complex conjugate appears in the second formula. We still have 
the formulas expressing additivity 


(Zi + Zp, W»- (Zi, Wt (Z5, W» 
and 
(Z, W + Wy = <Z, Wi» + <Z, W,>. 


We then say that the hermitian product is linear in the first variable and 
antilinear in the second variable. Note that instead of commutativity of 
the hermitian product, we have the formula 


If Z, W are real vectors, then the hermitian product (Z, W» is the same 
as the dot product. 

One can then develop the Gram-Schmidt orthogonalization process 
just as before using the hermitian product rather than the dot product. 

In the application of this chapter we did not need the hermitian pro- 
duct. All we needed was that a complex n x n matrix A has an eigen- 
value, and that the eigenvalues are the roots of the characteristic 
polynomial 

det(tI — A). 


As mentioned before, a non-constant polynomial with complex coeffi- 
cients always has a root in the complex numbers, so A always has an 
eigenvalue in C. In the text, we showed that when A is real symmetric, 
then such eigenvalues must in fact be real. 


Answers to Exercises 


I, $1, p. 8 


2. (—1, 7) (pet) (—3, 9) (0, —8) 
3. (1, 0, 6) (3, —2, 4) (6, —3, 15) (==?) 
4. (52:954) (0, —5, 7) (—3, —6,9) (2, — 6, 8) 
5. (31, 0, 6) (—n, 6, — 8) (3n, 9, —3) (—4n, 6, — 14) 
6. (15+7,1,3) | (5—7, —5,5) | (45, —6,12) (—2n, —6,2) 
I, 82, p. 12 


1. No 2. Yes 3. No 4. Yes 5. No 6. Yes 7. Yes 8. No 


I, 83, p. 15 

1. (a) 5 (b) 10. (c) 30 (d) 14. (e) x74 10 (f) 245 

2. (a) —3 (b) 12 (c) 2 (d) —17 (e)2z^— 16 (f) 15x — 10 
4. (b) and (d) 


I, 84, p. 29 


1. (a) J/5 (b) J/10 (c) /30 (d) J/14 (e) J/10-- 2? (f) 245 
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2. (a) /2 (b 4 (o) J3 (d) vid (e) \/58 - 4n? (f) /10 + x? 
3. (a) È, —3 (b) (06,3 (© (=3, 5,53) (d Gé, — 3e 19 
n^—8 15x — 10 


2n? + 29 CR = tM) 10 4+ 7? 


4. (a) (-$, 3) (D (-$, =) © - —15 3 (d —-tX-L —2, 3) 
2x? — 16 
n +10 


(e) (n, 3, —1) 


(e) (x, 3, — 1) (D - Ž 45, 29 


-1 -2 10 13 -1 

Qmd ctun i cartes, b EIL ERR ARUM d Dur cu 

D 75 Fra eae 9 Ta Jas O fa i RTT 
35 6 


1 16 25 
—_— =) b bd 9 
P 41-35 ./41-6 ) AITZ% 41-17 ./26-41 


7. Let us dot the sum 


C,À, t: -6A,-0 
with A;. We find 
CA Aite B GÁSBÁ T: FAS A = O-A; = 9. 
Since A,- A; = 0 if j #i we find 
cA; A; = O. 


But A;: A; #0 by assumption. Hence c; = 0, as was to be shown. 
8. (a) |A + Bl? + ||A— BI? = (A + B): (A + B) + (A — B): (A — B) 
= A? +2A-B+ B? + A? — 24. B + B? 
= 24? + 2B? = 2|A|? + 2||B|]? 
9. |A4 — B|? = A? —24-B- B? = | AI? — 2| AI ||Bilcos 0 + (BI? 


I, 85, p. 34 


1. (a) Let A = P, — P, = (—5, —2,3). Parametric representation of the line is 
X(t) — P, sA ad, 3, —1) + t(—5, —2, 3). 


2. X — (1, 1, — D + t, 0, —4) 3. x =(—1, 5, 2) + t(—4, 9 1) 
4. (a) (—3. 4, 3) (b) (— a 45, 0), (— 3: 53b) (c) (0, 4 E —#) (d) (— hers: 5) 


Pre 


5. P-i(Q—P)- 
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I, 86, p. 40 
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1. The normal vectors (2, 3) and (5, —5) are not perpendicular because their dot 


II, 


product 10 — 15 = —5 is not 0. 


. The normal vectors are (—m,1) and (—m', 1), and their dot product is 


mm' +1. The vectors are perpendicular if and only if this dot product is 0, 


which is equivalent with mm' — — 1. 


~y=x4+8 4. 4y25x—7 6. (c) and (d) 

.(a x—-y-3z—2-—1 (b)3x *2y—4z 2n +26 (c) x —5z= —33 
.(a) 2x+y4+2z=7 (b)7x—8y —9z—2 —29 (co) y+z=1 

. (3, —9, —5), (1, 5, —7) (Others would be constant multiples of these.) 


ki 2s. 1:5) itari =p 
. (a) X =(1, 0, —1) + ((—2, 1, 5) 


(b) X =(—10, —13, 7) + t(11, 13, —7) or also (1, 0, 0) + t(11, 13, — 7) 


OI MR ME. M eee 
. la) —4à = C). —— = 
a2 ^ 66 18 
. (a) (—4, P? 15) (b) TER 3 — h) 15. (1, 3, —2) 
(a) : (b) 3 
d) ^ W 
S xf 2 
81, p. 46 
"UT 2 ot ee 1 
P =( J -( 3 3 3) 
saf 2-0 4 jee 0 oed 
7 =(_; —2 ;) * -( 1 2 o) 
"IE "T o m 
ý =(_, K 7 =(3 E 
"UTI: om "EE = cese 
E =(_; 2 i) E -{ 2. | e 


Rows of A:(1, 2, 3, (— 1, 0, 2) 


Col f A: : ? 
olumns of A: 1h lo, lo 


Rows of B:(—1, 5, —2), (1, 1, —1) 
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Col bg Te oe 
olumns of B: peda pl 4 


1 -1 
Rows of A: (1, —1), (2, 1) Columns of ^() | 


—| I 
Rows of B:(—1, 1), (0, —3) Columns of 6: ( J (_;) 


| —1 —1 1 


3 2 —2 -—1 


"2 a 0 
ecu i Ci) 


sum of the ji-component of A plus the ji-component of B. 


S 8 : : 
. Same lo aF same 


ae. pepe e n 
' E JG d VE 


10. (a) (A -'A) 2'A-"A-'A*-A-A-'A. 


Il, 


(b) (A —'A) 2'A— "A = -(A-—'A) 
(c) The diagonal elements are 0 because they satisfy 


$2, p. 58 


.IA=AI=A 2. 0 

BD ecto 33. 37 
s (4 4 ini 2 " E 
aml A pa lP t 
uo e M 


7 14 14 0 
c2 ca-( ) BC = CB =( 


21 —7 
If C = xI, where x is a number, then AC = CA = xA. 


. (3, 1, 5), first row 


. Second row, third row, i-th row 


(96) 60 


ji? 


which is the 


10. 


11. 


13. 


14. 


16. 


17. 


18. 
19. 


20. 


21. 


22 
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3 12 5 
(941 ] O73] (0|4 
2 9 8 


Second column of A 12. j-th column of A 


: 3 X2 0 
(+) © () (3) «( 
5 1 


a ax+b 
(a) ( P. i} Add a multiple of the first column to the second column. 


Other cases are similar. 


0 1 1 
(a A*=10 0 oO], A'—Omatrix IfB-[ 5 o o q [then 
000 
0000 
00 12 0 0 0 1 
0001 000 0 
B? = ZEE P= d B^-O. 
0000 0000] ®© 2 
0000 0000 
123 1 3 6 1 4 10 
(bo) 42=10 12] 43=l0 1 3), at=lo 1 4 
0 0 1 0 0 1 0 0 1 
1 0 0 1 0 0 1 0 0 
0 4 of [o s ol 016 0 
00 9 0 027 0 081 


k 
n° 


Diagonal matrix with diagonal a‘, a5, ...,a 


0, 0 


0 1 
(| 19 

a b i 0 0 
(b) e on for any a, b z 0; if b = 0, then M 


(a) Inverse is I + A. 
(b) Multiply J — A by I + A + A? on each side. What do you get? 


(a) Multiply each side of the relation B = TAT”! on the left by T^! and on 
the right by T. We get 


TBT =T TAT !T- IAI = A. 


Hence there exists a matrix, namely T !, such that T !BT = A. This 
means that B is similar to A. 
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(b) Suppose A has the inverse A !. Then TA !T ! is an inverse for B 
because 


TAL BH TA T TAT TA AT SST SL 


And similarly BTA !T ! — I. 
(c) Take the transpose of the relation B= TAT !. We get 


tR = p tA "(T 
This means that 'B is similar to ‘A, because there exists a matrix, namely 
"T = C, such that ‘B = CAC V. 


23. Diagonal elements are a,,b,,,...,a,,b,,. They multiply componentwise. 


1 a+b 1 na 
(o ^i (s ") 
25 1 —a 

"40 1 


26. Multiply AB on each side by B !A^!. What do you get? Note the order in 
which the inverses are taken. 


27. (a) The addition formula for cosine is 
cos(0, + 0,) = cos 0, cos 0, — sin 6, sin 0,. 
This and the formula for the sine will give what you want. 


(b) A(0) ! = A(—0). Multiply A(0) by A(—0), what do you get? 


cos nÜ  —sin nO 
(c) A" -[. 0 "me You can prove this by induction. Take the 
sin n cos n 


product of A" with A. What do you get? 


js -N RI SE. ON pak: 10 
(9) eo 3) © a) e( s 2) 


oz | y o s (v7 n e - ) 
2 t5. 4 241 8) $ A\-1 -1 
cosÜ sin 60 


1 
29. . — (—1, 3 31. (—3, —1 
£ e: cos i as VA >?) ( ) 


32. The coordinates of Y are given by 
y, = x, cos 0 — x, sin 6, 


Ya x, sin 0+ x, cos 0. 


Find y? + yj by expanding out, using simple arithmetic. Lots of terms will 
cancel out to leave x? + x2. 
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33. (a) [1 4 2 —2\ (b) /0 0 0 0 
0 0 0 0 2 3 -—1 l 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 


34. (a) Interchange first and second row of A. 
(b) Interchange second and third row of A. 
(c) Add five times second row to fourth row of A. 
(d) Add —2 times second row to third row of A. 


35. (a) Multiply first row of A by 3. 
(b) Add 3 times third row to first row. 
(c) Subtract 2 times first row from second row. 
(d) Subtract 2 times second row from third row. 


36. (a) Put s-th row of A in r-th place, zeros elsewhere. 
(b) Interchange r-th and s-th rows, put zeros elsewhere. 
(c) Interchange r-th and s-th rows. 


37. (a) Add 3 times s-th row to r-th row. 
(b) Add c times s-th row to r-th row. 


II, 83, p. 69 


l. Let X =(x,,...,x,). Then X-E;-2 x, so if this O for all i then x; 20 
for all i. 
3. X-(c,A, t +0¢,A,) —0,X- A. - 06 X:A,-—0. 


II, 84, p. 76 


(There are several possible answers to the row echelon form, we give one of 
them. Others are also correct.) 


1. (a) /1 2 —5\ andalso /1 0 Z4 
0 9 —26 0 p 29 
0 0 0 0 0 0 
(b) /1 O 2\ andalso /1 0 0 
0 —{ -1 0 1 0 
0 Oo -i 0 0 1 
2.(a) [1 —2 3 —1 and also 1 0 0 i 
0. 3 -4 4 0O 1 0 -å 
0 0 7 ed 0 0 1 -# 
(D 2 0 —7 5X andalso /1 D- 22; 2 
0 1 3 —2 0 1 3 —2 
0 0 0 0 0 0 0 0 
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3. (a) /1 2 —1 2 1\ oralso /1 2 0 0 1 
0 0 3 —6 1 0 0 1 0 0 
0 0 0-6 1 0 0 0 1 =i 
(b) /1 3 —1 2\ oralso /1 0 4 B 
0 11 —5 3 0 tas = 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
II, 85, p. 85 
xu wm db (2 23 -H 
1. (a) ->| -4 -6 2] (».l1 19 —8 
20 5 
eI <2: %6 0 —10 5 
1 3. e oc l 5 —16 3 
(c) 1 ] 2 —3]| (d) : 0 2 
—2 —4 10 0 —2 1 
| 0 —19 0 
(e) ——[-32 -14 12 
7 
: 28 17 —20 
— 17 7 —9 
(f -—| 11-7 5 
13 —7 11 


2. The effect of multiplication by I,, is to put the s-th row in the r-th place, and 
zeros elsewhere. Thus the s-th row of I,,A4 is 0. Multiplying by I,. once 
more puts 0 in the r-th row, and 0 elsewhere, so I2 = O. 


3. We have E,,(c) = I + cI, and E,(c) =I + cI, so 


E,(c)E (c) = 0 + cl, JU + c1) 


= I + clp + cl, + cc'l?, 


=I+(c+c')I, because I[2=O 
= Ec +c’). 
III, §1, p. 93 
1. Let B and C be perpendicular to A; for all i. Then 
(B+ C)-A; = B-A; + C-A,;=0 for all i. 


Also for any number x, 
(xB): A; = x(B- Aj) = 0. 


Finally O- A; = O for all i. This proves that W is a subspace. 


pA 
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(c) Let W be the set of all (x, y) such that x +4y=0. Elements of W are 
then of the form (—4y, y). Letting y — 0 shows that (0,0) is in W. If 
( — 4y, y) and (—4y’, y) are in W, then their sum is (—4(y + y^), y + y) and 
so lies in W. If c is a number, then c(—4y, y) = (—4cy, cy), which lies in W. 
Hence W is a subspace. 


. Let v,, v, be in the intersection U œa W. Then their sum v, + v, is both in U 


(because v,, v, are in U) and in W (because v,, v, are in W) so is in the 
intersection U ^ W. We leave the other conditions to you. 

Now let us prove partially that U + W is a subspace. Let u,, u, be 
elements of U and w,, wz be elements of W. Then 


(u, +w,)+ (u2 + w2) = u, +u, +w, + Wo, 


and this has the form u + w, with u = u, + u, in U and w = w, +w, in W. 
So the sum of two elements in U + W is also in U + W. We leave the other 
conditions to the reader. 


. Let A and B be perpendicular to all elements of V. Let X be an element of V. 


Then (A + B): X = A: X + B: X =0, so A + B is perpendicular to all elements 
of V. Let c be a number. Then (cA): X = c(A: X) = 0, so cA is perpendicular to 
all elements of V. This proves that the set of elements of R" perpendicular to all 
elements of V is a subspace. 


III, 84, p. 109 


2 


3. 
4. 


(a) A — B, (1, —1) (b) 3A + 3B, (3, 3) 

(c) A+ B, (1, 1) (d) 3A + 2B, (3, 2) 

(a) (3; =}, 1) (b) (1, 0, 1) (c) (5, -i, —$) 

Assume that ad — bc #0. Let A = (a, b) and C = (c, d). Suppose we have 
XA + yC = O. 


This means in terms of coordinates 


xa + yc = 0, 
xb + yd = 0. 


Multiply the first equation by d, the second by c and subtract. We find 


x(ad — bc) = 0. 


Since ad — bc #0 this implies that x = 0. A similar elimination shows that 
y=0. This proves (i). 

Conversely, suppose A, C are linearly independent. Then neither of them 
can be (0, 0) (otherwise pick x, y #0, and get xA + yC = 0 which is impossi- 
ble). Say b or d#0. Then 


d(a, b) — b(c, d) = (ad — bc, 0). 
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Since A, C are assumed linearly independent, the right-hand side cannot be 0, 
so ad — bc #0. The argument is similar if a or c #0. 

For (iii) given an arbitrary vector (s, t), solve the system of linear equa- 
tions arising from xA + yC =(s,t) by elimination. You will find precisely 
that you need ad — bc #0 to do so. 


. Look at Chapter I, $4, Exercise 7. 


9. (3, 5) 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


(—95, 3) 


— OR UR 00 
ossible basis: (0 oblo gbl, oblo 4 


(Ej) where E;; has component 1 at the (i,j) place and O otherwise. These 
elements generate Mat(m x n), because given any matrix A = (ajj) we can 
write it as a linear combination 


Furthermore, if 


then we must have a;; = 0 for all indices i, j so the elements E;; are linearly 
independent. 


E; where E; is the n x n matrix whose ii-th term is 1 and all other terms are 
0. 


A basis can be chosen to consist of the elements E;; having ij-component 
equal to 1 for i<j and all other components equal to 0. The number of 
such elements is 


| n(n * 1) 


| ees iens 
qp aa a ) 


A basis for the space Sym(n x n) of symmetric n x n matrices can be taken 
to be the elements Ej; with i <j having ij-component equal to 1, ji-compo- 
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nent equal to 1, and rs-component equal to O if (r,s) #(i,/) or (j,i). The 
proof that these generate Sym(n x n) and are linearly independent is similar 
to the proof in Exercise 12. 


III, $5, p. 115 


I. 
2. 


3. 


(a) 4 (b) mn (c) n (d) n(n+1)2 (e) 3 (f) 6 (g) n(n + 1)/2 


0, 1, or 2, by Theorem 5.8. The subspace consists of O alone if and only if it 
has dimension 0. If the subspace has dimension 1, let v, be a basis. Then 
the subspace consists of all elements tv,, for all numbers t, so is a line by 
definition. If the subspace has dimension 2, let v,, v, be a basis. Then the 
subspace consists of all elements t,v, + t,v, where t,, t, are numbers, so is a 
plane by definition. 


0, 1, 2, or 3 by Theorem $5.8. 


III, 86, p. 121 


l. 


IV, 


l. 


3 
4. 
5. 
6. 
Ts 
8 


10. 
11. 


IV, 
I. 
2, 
5. 


6. 


(a) 2 (b) 2 (c)2 (d)! (e)2 (03 (g3 (h)2 (i) 2 
§1, p. 126 

(a) cos x (b) e* (c) l/x 2i (511/79. 1/./2) 

(a) 11 (b) 13 (c) 6 

(a) (e, 1) (b) (1, 0) (c) (I/e, —1) 

(a) (e+ 1, 3) (b) (e^ +2, 6) (c) (1, 0) 

(a) (2, 0) (b) (me, n) 

(a) 1 (b) 11 

Ellipse 9x* + 4y? = 36 9. Line x = 2y 

Circle x? + y? = e?, circle x? + y? = e* 

Cylinder, radius 1, z-axis = axis of cylinder 12. Circle x? + y^ = 1 
§2, p. 134 


All except (c), (g) 
Only Exercise 8. 


Since AX = BX for all X this relation is true in particular when X = E! is 
the j-th unit vector. But then AE!- A4! is the j-th column of A, and 
BE! = B! is the j-th column of B, so A4! -— B! for all j. This proves that 
A — B. 


Only u = O, because 7,(0)=u and if T, is linear, then we must have 
T(0) = 0. 
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7. 


13. 


The line S can be represented in the form P + tv, with all numbers t. Then 
L(S) consists of all points 


L(P) + tL(v,). 


If L(v,) = O, this is a single point. If L(v,) # O, this is a line. Other cases 
done similarly. 


. Parallelogram whose vertices are B, 3A, 3A + B, O. 

. Parallelogram whose vertices are 0, 2B, 5A, 5A + 2B. 
- (a) (-1, —1) (b) (2/5, 1) (c) (7-2, —1) 

- (a) (4, 5) (b) (11/3, —3) (c) (4, 2) 

12. 


Suppose we have a relation } x;v; =O. Apply F. We obtain Y x;F(v) = 
X x;w; = 0. Since the ws are linearly independent it follows that all x; = 0. 


(a) Let v be an arbitrary element of V. Since F(v,) x 0 there exists a number 
c such that 


F(v) = cF(vo), 


namely c = F(v)/F(vg). Then F(v — cvo) = 0, so let w= v — cvo. We have 
written v = w + ct, as desired. 

(b) W is a subspace by Exercise 3. By part (a) the elements vp, 0j,...,v 
generate V. Suppose there is a linear relation 


n 


Gavo + CU; t c c 6,0, = O. 


Apply F. We get co F(v9) = 0. Since F(v,) Æ 0 it follows that c; = 0. But 
then c; = 0 for i = 1,...,n because v,,...,v, form a basis of W. 


IV, 83, p. 141 


1 and 2. If U is a subspace of V then dim L(U) € dim U. Hence the image of 


3. 


4. 
3: 


a one-dimensional subspace is either 0 or 1. The image of a two- 
dimensional subspace is 0, 1, or 2. A line or plane is of the form P+ U, 
where U has dimension 1 or 2. Its image is of the form L(P) 4- L(U), so 
the assertions are now clear. 


(a) By the dimension formula, the image of F has dimension n. By Theorem 
4.6 of Chapter III, the image must be all of W. 
(b) is similar. 


Use the dimension formula. 


Since L(vg + u) = L(vo) if u is in Ker L, every element of the form vg + u is a 
solution. Conversely, let v be a solution of L(v) = w. Then 


L(v — vg) = L(v) — L(vg) = w-— w= O, 


SO v — Uy = u Is in the kernel, and v = v, + u. 
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6. Constant functions. 

7. Ker D? = polynomials of deg < 1, Ker D" = polynomials of deg < n — 1. 
8. (a) Constant multiples of e* (b) Constant multiples of e^* 

9.(a n—1 (b m^-—1 

10. A= pn p^ Ds If A=B+C=B,+C,, then 


B—B,-2C,-C. 


But B— B, = C, —C is both symmetric and skew-symmetric, so O because 
each component is equal to its own negative. 


11. (c) Taking the transpose of (A + 'A)/2 show that this is a symmetric matrix. 
Conversely, given a symmetric matrix B, we see that B = P(B), so B is in 
the image of P. 
(d) n(n — 1)/2. 
(e) A basis for the skew-symmetric matrices consists of the matrices E;; with 
i « j having ij-component equal to 1, ji-component equal to — 1, and all 
other components equal to O. 


12. Similar to 11. 
13 and 14. Similar to 11 and 12. 


15. (a) O (b) m+n, {(u;, 0), (0, w,)}; i= L...,m; j= L...,n. If (uj is a basis 
of U and (wj) is a basis of W. 


16. (b) The image is clearly contained in U + W. Given an arbitrary element 
u+w with u in U and w in W, we can write it in the form 
u + w =u —(—w), which shows that it is in the image of L. 
(c) The kernel of L consists of those elements (u, w) such that u — w = O, so 
u =w. In other words, it consists of the pairs (u, u), and u must lie both 
in U and W, so in the intersection. If {u,,...,u,} is a basis for U ^ W, 
then {(u,,u,),...,(u,,u,)} is a basis for the kernel of L. The dimension is 
the same as the dimension of UM W. Then apply the dimension formula 
in the text. 


IV, 84, p. 149 


1l. n— 1 2.4 3. n— 1 
4. (a) dim. = 1 basis = (1, — 1, 0) 
(b) dim. =2 basis = (1, 1, 0)(0, 1, 1) 


—3 n42 
(c) dim. = 1 basis = Lead E » | 
10 5 


(d) dim. = 0 
5.(a) 1 (b) 1 (0 0 (d)2 
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6. One theorem states that 
dim V = dim Im L + dim Ker L. 


Since dim Ker L 2 0, the desired inequality follows. 


7. One proof (there are others): rank A = dim Im L,. But L,, = L,eL;. 
Hence the image of Ly, is contained in the image of L,. Hence 
rank AB < rank A. 

For the other inequality, note that the rank of a matrix is equal to the 
rank of its transpose, because column rank equals row rank. Hence 
rank AB = rank 'B'A. 


Now apply the first inequality to get rank 'B'A < rank ‘B= rank B. 


IV, 85, p. 156 


1000 A e 
Exe xc] [9:1 9:50 
00 10 


1 0 
(c) 31 (d) 71 (e) -I (f) : 


oO oco 
oO oco 


2. cl, where I is the unit n x n matrix. 


| -4 3 3 —2 1 
e ( 2 j e -1 ;) 


Q- F a 3 0 0 -2 0 1 
4.(a)/ 3-2 4| (lo -7 ol olo o o0 
ep X d 0 0 5 7 _1 0 


5. Let Lv; — Y cyw; Let C = (cjj). The associated matrix is ‘C, and the effect 
of L on a coordinate vector X is 'CX. 


€, 0 PM 0 T i 
BREI iy ig 

i sk 0 

a 000 


V, §1, p. 162 


I. Let C = A— B. Then CX =O for all X. Take X = E! to be the j-th unit 
vector for j =1,...,n. Then CE! = C! is the j-th column of C. By assump- 
tion, CE! = O for all j so C = O. 


2. Use distributivity and the fact that FoL = LoF. 
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. Same proof as with numbers. 


. PA 210 + TY 2 Y? + 2TI + T?) = AQI + 2T) 2 XI + T) =P. Part (b) is 


left to you. For part (c), see the next problem. 


(a) Q? =(1—- PP P —2IP-P-IC2P-P-I—P-0Q. 


(b) Let velm P so v= Pw for some w. Then Qv — QPv — 0 because 
OP =(I—P)P=P—P?=P—P=O. Hence Im P c Ker Q. Conver- 
sely, let ve Ker Q so Qv = 0. Then (J — P)v = 0 so v — Pv = 0, and v= 
Pv, so veIm P, whence Ker Q c Im P. 


. Let ve V. Then v = v — Pv + Pv, and v — Pve Ker P because 


P(v — Pv) = Pv — P?v = Pv — Pv = 0. 
Furthermore Pve Im P, thus proving (a). 
As to (b), let ve Im P^ Ker P. Since velIm P there exists weV such that 
v — Pw. Since ve Ker P, we get 


0 = Pv = P?w 2 Pw — v, 


so v = 0, whence (b) is also proved. 


.Suppose u+w=u,+w,. Then u—u,—w,—w. But u—u,c€U and 


w; —we W because U, W are subspaces. By assumption that U ^ W = {0}, 
we conclude that u — u, —0—w,—wsou-cu,w-w,. 


. P?(u, w) = P(u, 0) = (u, 0) = P(u, w). So P? = P. 


. The dimension of a subspace is € the dimension of the space. Then 


Im F»L im F SO dim Im F »L € dim Im F 


so rank FoL < rank F. This proves one of the formulas. For the other, 
view F as a linear map defined on Im L. Then 


rank Fo L= dim Im Fo L € dim Im L = rank L. 


V, S2, p. 168 
1. RB  R , because RR ,—-R, ,—-Rg-I. The matrix associated with 
Rg ' is 


cos Ü sin @ 
—sin@ cos 0 


because cos( — 0) = cos 0. 


. The composites as follows are the identity: 


FoGoG 'oF l=! and G loF 'oFoG=l. 
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4, 5, 6. In each case show that the kernel is O and apply the appropriate 
theorem. 


7 through 10. The proof is similar to the same proof for matrices, using distri- 
butivity. In 7, we have 


ü—L»»0-LD2P-L-L 


For 8, we have L? + 2L = —I so L(—L—2I)-LsoL ! = —L-— 2I. 


11. It suffices to prove that v, w are linearly independent. Suppose xv + yw = O. 
Apply L. Then 


L(w) = L(L(v)) = O 


because L? = O. Hence L(xv) = xL(v)= O. Since L(v) z O, it follows that 
x =0. Then y = 0 because w z O. 


12. F is injective, its kernel is O. 
(b) F is not surjective, for instance (1,0,0,...) is not in the image. 
(c) Let G(x,, x5,...) = (x5, X4,...) (drop the first coordinate). 
(d) No, otherwise F would have an inverse, which it does not. 


13. Linearity is easily checked. To show that L is injective, it suffices to show 
that Ker L = {0}. Suppose L(u, w) = 0, then u + w 20, so u= —w. By as- 
sumption U ^ W = {0}, and ueU, —we W sou -w =0. Hence Ker L = {0}. 

Lis surjective because by assumption, every element of V can be written 
as the sum of an element of U and an element of W. 


VI, 81, p. 178 


2. Let X = (x, y. Then 


b WV d—b? 
(X, X» = ax? + 2bxy + dy? = a(x m ) x y 


2 
a 


If ad — b? > 0, then (X, XY is a sum of squares with positive coefficients, 
and one of the two terms is not 0, so (X, X» » 0. If ad — b? € 0, then give 
y any non-zero value, and let x = —by/a. Then (X, X» <0. 

If a = 0, let y = 0 and give x any value. Then 'X AX = 0. 


3. (a) yes (b) no (c) yes (d) no (e) yes (f) yes 
4. (a)2 (b)4 (c) 8 


5. The diagonal elements are unchanged under the transpose, so the trace of A 
and 'A is the same. 


6. Let AA = (cj). Then 


n 
Cii = ` Qi Gi. 
k=1 


ANSWERS TO EXERCISES 281 


Hence 


tr(AA)= VY aya, 
i k=l 


and apa, = a2 because A is symmetric, so the trace is a sum of squares, 
hence = 0. 


8. (a) tr(AB) = 2 2 aijbji =) à b jiGij- 


But any pair of letters can be used to denote indices in this sum, which 
can be rewritten more neutrally. 


Ips 


r-l1s-1 


This is precisely the trace tr(BA). 
(b) tr(C ! AC) = tr(ACC ^ !) by part (a), so = tr(A). 


VI, 82, p. 189 


| 
(1, 1, —1) and — (1, 0, 1) 


2 


| 

“3 

L À anes 

v6 ae B 

: (1, 2, 1, 0) and —— (—1, —2, 5, 3) 
El 

= (1, 1, 0, 0), XI, —1, 1, D, 


: " (t? — 31/4), 3 t 
80 (t? — 31/4), ./3 t, 10t? — 12t + 3 


9. Use the dimension formulas. The trace is a linear map, from V to R. Since 


1. (a) 


(b) (2, 1, 1), 


l 


J18 


(2 2; 3, T) 


Eh s 


n wm 


dim V = dim Ker tr + dim Im tr, 


it follows that dim W = dim V — 1 so dim Wt = 1 by Theorem 2.3. Let I be 
the unit n x n matrix. Then tr(A) — tr(AI) for all AeV, so tr(AI) =0 for 
Ac W. Hence Ie Wt. Since W+ has dimension 1, it follows that I is a basis 
of W+. (Simple?) 


10. We have (X, AY» = (X, bY) = b(X, Y» and also 
(X, AY) = AX, Y» = (aX, Y» = a(X, Y». 


So a(X, Y» = b(X, Y». Since a z b it follows that (X, Y» = 0. 
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VI, $3, p. 194 


1. For all column vectors X, Y we note that 'X AY is a 1 x 1 matrix, so equal 
to its own transpose. Therefore 


p(X, Y) ='XAY 2 (CXAY) 2 'Y'A"X 2 'YAX = 9,(Y, X). 


2. Conversely, if 9 ,(X, Y) = 9,(Y, X), then a similar argument as above shows 
that 
'X AY ='X'AY 


for all X, Y. Then A ='A by the proof of uniqueness in Theorem 3.1. 


3. (a) 2x,y, — 3xiy; + 4x3y1 + X3ya 
(b) 4x,y, + xiy; — 2x5y4 + 5x5ya 
(c) 5x,y, + 2x,ya + nx5y4 t 7x5y2 
(d) x,y, + 2x,y5 — X1Y3 — 3x5y,  X53ya + Ax5ys + 2x3y, + 5x3 Y2 — X33 
(e) —4x yy  2xiya + Xiya + 3X3,  X5ya + Xay3 + + 2x3yi + 5x3y2 + 


7X3 y3 
(f) —ix,y, + 2x,ya — 5xyya + x3y1 + $Xaya  Axzys — X3 y1 + 3x33 


VII, 82, p. 207 


1. (a) —20 (b) 5 (c 4 (d)16 (e —76 (f) —14 
2. (a) —15 (b) 45 (c) O (d) O (e) 4 (f) 14 (g) 108 (h) 135 (i) 10 


3. 11025: An 4. 1 


5. Even in the 3 x 3 case, follow the general directives. Subtract the second 
column times x, from the third. Subtract the first column times x, from the 
second. You get 


1 0 0 
| x,—x, xi— xjxi|. 


Expanding according to the first row yields 


X4— Xi X3(X5 — xi) 


X34 — X, %3(X3 — x4) 


You can factor out x, — x, from the first row and x4 — x, from the second 
row to get 
X2 


(x; — x,Xx3 — x4) i 


Then use the formula for 2 x 2 determinants to get the desired answer. 
Now do the 4 x 4 case in detail to understand the pattern. Then do the 
n x n case by induction. 
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6. (a) 3 (b) —24 (c) 16 (d) 14 (e) O (f) 8 (g) 40 (h) —10 (i) TT a, 
i=1 


7. 1 8. t^-- 84 5 


11. D(cA) = D(cA', cA?, cA?) = cD(A', cA*, cA?) = D(A}, A?, A?) using the 
linearity with respect to each column. 


12. D(A) = D(cA!,...,cA") = c"D(A!,...,A") using again the linearity with respect 
to each column. 


VII, §3, p. 214 


1. 2 252 3.2 4. 3 5. 4 6. 3 T. 2 8. 3 


VII, 84, p. 217 


1 2 1 5 1 
La) x= —3, yI (b) aS Sa 


5 
(c) x= — 374, Y = g, Z= 
CEE pS mum b owe 


VII, 85, p. 221 


1. Chapter II, 85 
2 ol d —b 
` ad — bc V —c a 


VII, 86, p. 232 
2. (a) 14 (b) 1 
3. (a) 11 (b) 38 (c) 8 (d) 1 
4 (a) 10 (b) 22 (c) 11 (d) O 


VIII, 81, p. 237 


X ; due Lu. 
1. Let ( ) be an eigenvector. Then matrix multiplication shows that we must 
y 


have x + ay = Ax and y = Ay. If y ZO then å = 1. Since a #0 this contra- 
dicts the first equation, so y — 0. Then E!, which is an eigenvector, forms a 
basis for the space of eigenvectors. 


2. If A-cl is a scalar multiple of the identity, then the whole space consists of 
eigenvectors. Any basis of the whole space answers the requirements. The 
only eigenvalue is c itself, for non-zero vectors. 


3. The unit vector E' is an eigenvector, with eigenvalue aj. The set of these 
unit vectors is a basis for the whole space. 
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4. Let y= 1. Then using matrix multiplication, you find that '(x, 1) is an eigen- 
vector, with eigenvalue 1. If 0 — 0, the unit vectors E!, E? are eigenvectors, 
with eigenvalues 1 and — 1 respectively. 


5. Let v, = '(— 1, x) so v, is perpendicular to v, in Exercise 4. Matrix multipli- 
cation shows that Av, = —v,. 
a —b ae — "T 
6. Let A= b a) Then the characteristic polynomial is (t — a)“ + b^, and 


for it to be equal to 0 we must have b=0, t=a. But a? + b? = 1 so a= 
t 1. 


7. A(Bv) = ABv = BAv = B4v = ABv. 


VIII, 82, p. 249 


1. (a) (t — a,,)...(t— amn) (b) a,,,....a,, 


2. Same as in Exercise 1. 
2 
3. (a) (t —4)(t + 1); eigenvalues 4, —1; corresponding eigenvectors 6 and 


1 
( ) or non-zero scalar multiples. 


(b) (t — 1)(t — 2); eigenvalues 1, 2; eigenvectors (1, — 1/4) with 4=1 and 
A = 2. 
(c) t? + 3; eigenvalues +./—3; eigenvectors (1,1/(4 — 2)) with 4=./—3 


and = —4/-—3. 


l 2 
(d) (t — 5)(t + 1); eigenvalues 5, —1; eigenvectors B and | d 


respectively, or non-zero scalar multiples. 


4. (a) (t — 1) (t — 2)(t — 3); eigenvalues 1, 2, 3; eigenvectors 


0 21/0 = 
1], 1 |, 1 
0 1 


respectively, or non-zero scalar multiples. 
(b) (t — 4)(t + 2)*; eigenvalues 4, —2; Eigenvectors: 


1 for 4; 1 and 0 for —2. 


Non-zero scalar multiples of the first; linear combinations of the pair of 
eigenvectors for —2 are also possible, or in general, the space of sol- 
utions of the equation 


x—-y+z=0. 
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(c) (t — 2)*(t — 6); eigenvalues 2, 6; eigenvectors: 


l 1 1 
—| and 0 for 2; 2 for 6. 
0 —] 1 


Linear combinations of the first two are possible. Non-zero scalar 
multiples of the second are possible. 


(d) (t — 1)(t — 3}; eigenvalues 1, 3; eigenvectors: 


2 l | 
—] for 1; 1 and 0 for 3. 
1 0 1 


Non-zero scalar multiples of the first are possible. Linear combinations 
of the second two are possible. 


5. (a) Eigenvalue 1; eigenvectors scalar multiples of '(1, 1). 
(b) Eigenvalue 1; eigenvectors scalar multiples of '(1, 0). 
(c) Eigenvalue 2; eigenvectors scalar multiples of '(0, 2). 
(d) Eigenvalues 2, = (1 + Ja 3/2, 44-2 (1— J= 3)/2; Eigenvectors scalar 
multiples of '(1, (4 -- 1) !) with 4 = 4, or 2 = 4,. There is no real eigen- 
vector. 


6. Eigenvalues 1 in all cases. Eigenvectors scalar multiples of '(1, 0, 0). 


7. (a) Eigenvalues +1, +./-1. Eigenvectors scalar multiples of '(1, 4, 47, 4?) 
where ^ is any one of the four eigenvalues. There are only two real 
eigenvalues. 


(b) Eigenvalues 2, (—1 + es 3/2, (—1 — pay Eigenvectors scalar mul- 


A+1)*+4 
tiples of (i C 2 , 4+1) where 4 is one of the three eigenvalues. 
There is only one real eigenvector namely ‘(1, 1, 3) up to real scalar 
multiples. 


9. Each root of the characteristic polynomial is an eigenvalue. Hence A has n 
distinct eigenvalues by assumption. By Theorem 1.2 n corresponding eigen- 
vectors are linearly independent. Since dim V = n by assumption, these eigen- 
vectors must be a basis. 


10. We assume from the chapter on determinants that the determinant of a ma- 
trix is equal to the determinant of its transpose. Then 


det(xI — A) = det(x1 — ʻA), 


so the roots of the characteristic polynomial of A are the same as the roots 
of the characteristic polynomial of ‘A. 
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11. 


12. 


13. 


14. 


That À is an eigenvalue of A means there exists a vector v 40 such that 
Av = dv. Since A is assumed invertible, the kernel of A is 0, so 4 #0. Apply 
A^! to this last equation. We get 


v= A l(Avr) = AA" lo, 


1 


whence A~'v = 47 !v, so A ! is an eigenvalue of A`}. 


Let f(t) 2 sin t and g(t) = cost. Then 


Df=0 +9, 
Dg = —f + 0. 


Hence the matrix associated with D with respect to the basis { f, g} is 


" 0 —1 
UMP X 
The characteristic polynomial is 


l 2 
P(t) = jeer. 


—1 


Since this polynomial has no root in R, it follows that A, and hence D, has 
no eigenvalue in the 2-dimension space whose basis is | f, gj. 


By calculus, d?(sin kx)/dx? = —k^sin kx. This means that the function 
f(x) = sin kx is an eigenvector of D?, with eigenvalue —k*. Similarly for the 
function g(x) = cos kx. 


Let v be an eigenvector of A, so that Av = Av, v #0. Since {v,,...,v,} is a 
basis of V, there exist numbers a,,...,a, such that 


v= AVi ee + a. 
Applying A yields 
Av = AV = Q4C,U, t + ,C,0,. 
But we also have 
Av = åa, V, +--+ + AQ, 


Subtracting gives 


ay(A—c,)v, +: +a,A — cQ, = O. 


15. 
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Since v,,...,v, are linearly independent, we must have 
a(A — c;) = 0 for all i=1,...,n. 


Say a, #0. Then 4—c, =0 so 4=c,. Since we assumed that the numbers 
C,,..,C, are distinct, it follows that à — c; #0 for j= 2,...,n whence a; =0 
for j = 2,...,n. Hence finally v = a,v,. This concludes the proof. 


Let v #0 be an eigenvector for AB with eigenvalue A4, so that ABv = Av. 
Then 


BABv = ABv. 


If Bv # 0 then Bv is an eigenvector for BA with this same eigenvalue 4. If on 
the other hand Bv = 0, then 2 = 0. Furthermore, BA cannot be invertible 
(otherwise if C is an inverse, BAC — I so B is invertible, which is not the 
case). Hence there is a vector w #0 such that BAw = O0, so 0 is also an ei- 
genvalue of BA. This proves that the eigenvalues of AB are also eigenvalues 
of BA. 

There is an even better proof of a more general fact, namely: 


The characteristic polynomial of AB is the same as that of BA. 


To prove this, suppose first that B is invertible. Since the determinant of a 
product is the product of the determinants, we get: 


det(xI — AB) = det(B(xI — AB)B !) = det(xl — BABB +) = det(xI — BA). 
This proves the theorem assuming B invertible. But the identity to be proved 
det(xJ — AB) = det(xI — BA) 
is a polynomial identity, in which we can consider the components of A and 
B as "variables", so if the identity is true when A is fixed and B = (bij) is 


"variable" then it is true for all B. A matrix with “variable” components is 
invertible, so the previous argument applies to conclude the proof. 


VIII, $3, p. 255 


1. (a) 1, 3 


(b) (1 + /5)2, (1 — \/5)/2 


2. (a) 0, 1, 3 


(b) 2,24 2 


The maximum value of f is the largest number in each case. 


—1+,/74 
2 
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VIII, 84, p. 259 


l. Y a,x? if a,,...,a, are the diagonal elements. 
2. Let B have diagonal elements 41^2,...,41/2. 


3. Let v be an eigenvector # O with eigenvalue 4. Then (Avv) = (Av, v? = 
<v, v. Since (v, v» » 0 and (Av, vò > O, it follows that 4 — 0. Pick a basis 
of V consisting of eigenvectors. The vector space V can then be identified as 
the space of coordinate vectors with respect to this basis. The matrix of A 
then is a diagonal matrix, whose diagonal elements are the eigenvalues, and 
are therefore positive. We can then use Exercise 2 to find a square root. 


4. Similar to Exercise 3. 


5. From (AA) — '4'A = AA, it follows that A? is symmetric. Furthermore, for 
v z O, 


(A?v, v) = (Av, ' Av) = (Av, Av) > 0, 
because Av z O since (Av, v» > O. 


Since '4 ! = A^! it follows that A ! is symmetric. Since A is invertible, 
a given v can be written v = Aw for some w (namely w= A !v). Then 


(A lov» = (A l Aw, Aw) = <w, Aw) = (Aw, w» > 0. 
Hence A^! is positive definite. 
6. Assume (i). From the identity in the hint, we get 
4<Uv, Uw) = (U(v + w), U(v + w)> — QU(v — w), U(v — w)» 


= ((v + w), v + w)> — <(v — w), (v — w)> 
= 4v, w}. 
Hence (Uv, Uw) = (v, w». The converse is immediate. 
Assume (1). Then Ker U =O because if Uv = O then ||v|| 2 0 so v= O. 


But a linear map with O kernel from a finite dimensional vector space into 
itself is invertible, so U is invertible. Also, for all v, we V, 


(Uv, Uw) = <'U Uv, wò and also equals (v, w> by hypothesis. 
Hence ‘UU = I so 'U = U^!. Conversely, from 'U = U~! we get 
(Uv; Ur» = (UU, v» = (v, v5, 


so U satisfies (1). 


ANSWERS TO EXERCISES 289 


7. Since ''AA) = 'A"A ='AA, it follows that 'AA is symmetric. Furthermore, for 
v # O, we have 


C AAv, 0» = (Av, Av» > 0 


because A is invertible, Av Æ O. Hence ‘AA is positive definite. Let U = 
AB ! where B? ='AA and BA = AB, so B !A- AB !. Then 


(Uv, Uv) = (AB +v, AB 'v> = (B ! AL B ! Av) 
= (Av,'B ! B ! Av) 
= (v, AAT? Av) = (t, v). 
Hence U is unitary. 


8. Let v be a non-zero eigenvalue 4 > 0. Then 
(Bv, Bv) = 4?4v, v» = (v, vò 


because B is unitary. Hence 4? = 1. Hence A = +1, and since 4 is positive, 
i = 1. Since V has a basis consisting of eigenvectors, it follows that B = I. 
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Differential equations 138, 258 
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Eigenvector 136, 233 

Element 88 

Elementary matrix 63, 77, 80 
Elementary row operation 71 
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Imaginary 251 
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Kernel 136 
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Linear mapping 127 
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Maximal set of linearly independent 
vectors 113 

Maximum 251 
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Non-trivial solution 67 
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Orthogonal complement 146 
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Parallel planes 36 
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Plane 34, 95, 112 
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Product space 243 
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Row operation 70, 116 
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Scalar product 12, 171 
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Spanned 102 
Sphere 19 
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Sum of matrices 44 
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Upper triangular matrix 62, 111, 115 
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Vector 10, 
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Vector space 88 
Volume 229 
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Zero functions 90 
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Zero matrix 44 
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This book is a short text in linear algebra, intended for a one-term 
course. In the first chapter, Lang discusses the relation between the 
geometry and the algebra underlying the subject, and gives concrete 
examples of the notions which appear later in the book. He then starts 
with a discussion of linear equations, matrices and gaussian elimina- 
tion, and proceeds to discuss vector spaces, linear maps, scalar prod- 
ucts, determinants, and eigenvalues. The book contains a large 
number of exercises, some of the routine computational type, and 
others more conceptual. 
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